Building a Dynamic Persona-Based Mixture-of-Experts RAG System
A comprehensive guide to building a dynamic, graph-based Mixture-of-Experts Retrieval-Augmented Generation system that leverages persona-driven AI agents for contextually rich responses.
Daniel Kliewer
Author, Sovereign AI

Code for this guide can be found on my github here
Building a Dynamic Persona-Based Mixture-of-Experts RAG System
Introduction
Welcome to this comprehensive guide on building a dynamic, graph-based Mixture-of-Experts (MoE) Retrieval-Augmented Generation (RAG) system that leverages persona-driven AI agents. This project represents a cutting-edge approach to AI orchestration, combining multiple AI "personas" that dynamically traverse knowledge graphs to provide contextually rich, diverse responses.
In this post, we'll walk through the complete construction of this system, from initial project setup to the final scaffolded architecture. We'll explore each component, understand the design decisions, and learn how the pieces fit together to create an intelligent, adaptive AI system.
Part 1: Project Foundations and Architecture
1.1 The Vision: Dynamic Persona MoE RAG
At its core, this system implements a Mixture-of-Experts RAG where:
- Personas are specialized AI agents with unique traits, expertise, and behavioral patterns
- Dynamic Graphs represent knowledge in a flexible, query-scoped structure
- Traversal Logic allows personas to navigate graphs based on their individual perspectives
- Ollama Integration provides local LLM inference with synthesized persona context
The key innovation is the dynamic nature: graphs are built on-demand for each query, personas evolve through performance feedback, and the system adapts through pruning and promotion cycles.
1.2 Project Initialization
We begin by creating a robust Python project structure:
bash1mkdir dynamic_persona_moe_rag2cd dynamic_persona_moe_rag3python3 -m venv venv
The .gitignore file follows Python best practices, excluding virtual environments, cache files, and build artifacts:
gitignore1# Byte-compiled / optimized / DLL files2__pycache__/3*.py[cod]4*$py.class56# Environments7.env8.venv9env/10venv/11ENV/
1.3 Core Architecture Overview
The system follows a modular architecture with clear separation of concerns:
text1src/2├── core/ # Main orchestration and interfaces3├── graph/ # Dynamic knowledge graph implementation4├── personas/ # Persona lifecycle and storage5├── agents/ # Specialized AI agents6├── evaluation/ # Scoring and metrics7└── storage/ # Persistence and snapshots89configs/ # YAML configuration files10scripts/ # Pipeline execution11data/ # Input/output data
Part 2: Configuration and Data Structures
2.1 Configuration System
The system uses YAML for configuration, providing human-readable, type-safe settings:
system.yaml - Global parameters:
yaml1# Global system parameters2max_iterations: # Maximum number of iterations for the pipeline3batch_size: # Batch size for processing4log_level: # Logging level (DEBUG, INFO, etc.)5enable_caching: # Whether to enable caching
thresholds.yaml - Pruning logic:
yaml1# Pruning and promotion thresholds2pruning_threshold: # Threshold for pruning personas3promotion_threshold: # Threshold for promoting personas4demotion_threshold: # Threshold for demoting personas5activation_threshold: # Threshold for activating personas
ollama.yaml - Model settings:
yaml1# Local model configuration2model_name: # Name of the Ollama model to use3temperature: # Temperature for generation4max_tokens: # Maximum tokens to generate5api_endpoint: # Ollama API endpoint (usually localhost)
2.2 Persona Schema Definition
Personas are defined by a strict JSON schema ensuring consistency:
json1{2 "$schema": "https://json-schema.org/draft/2020-12/schema",3 "type": "object",4 "properties": {5 "persona_id": {6 "type": "string",7 "description": "Unique identifier for the persona"8 },9 "traits": {10 "type": "object",11 "patternProperties": {12 "^.*$": {13 "type": "integer",14 "minimum": 1,15 "maximum": 9,16 "description": "Trait value between 1 and 9"17 }18 },19 "description": "Object containing trait names as keys and numeric values 1-9 as values"20 },21 "expertise": {22 "type": "array",23 "items": {24 "type": "string"25 },26 "description": "Array of strings representing areas of expertise"27 },28 "activation_cost": {29 "type": "number",30 "description": "Float representing the cost to activate this persona"31 },32 "historical_performance": {33 "type": "object",34 "description": "Object containing historical performance metrics"35 },36 "metadata": {37 "type": "object",38 "description": "Object containing additional metadata"39 }40 },41 "required": ["persona_id", "traits", "expertise", "activation_cost", "historical_performance", "metadata"]42}
Part 3: Core Components Deep Dive
3.1 Dynamic Knowledge Graph
The graph system is designed for query-scoped efficiency:
Graph Class:
python1class DynamicKnowledgeGraph:2 """3 A dynamic graph that constructs nodes and edges on-demand for a single query.4 """56 def __init__(self):7 self.nodes = {}8 self.edges = []910 def add_node(self, node_id, node_data):11 """Lazily construct a node when needed."""12 pass1314 def add_edge(self, source_id, target_id, edge_data):15 """Create an edge on-demand between nodes."""16 pass
Node and Edge Classes:
python1class Node:2 """Represents a node in the dynamic knowledge graph."""3 def __init__(self, node_id, data=None):4 self.node_id = node_id5 self.data = data or {}6 self.edges = []78class Edge:9 """Represents an edge in the dynamic knowledge graph."""10 def __init__(self, source_node, target_node, data=None):11 self.source = source_node12 self.target = target_node13 self.data = data or {}
3.2 Persona Traversal Interface
The traversal system uses abstract interfaces for flexibility:
python1from abc import ABC, abstractmethod23class PersonaTraversalInterface(ABC):4 """5 Abstract base class defining the interface for persona traversal.6 """78 @abstractmethod9 def evaluate_node_relevance(self, persona, node):10 """11 Evaluate how relevant a graph node is to a given persona.12 Returns: float (relevance score between 0 and 1)13 """14 pass1516 @abstractmethod17 def decide_traversal(self, current_node, available_nodes, persona):18 """19 Decide which nodes to traverse to next based on persona evaluation.20 Returns: list (nodes to traverse to next)21 """22 pass
3.3 Mixture-of-Experts Orchestrator
The orchestrator manages the entire MoE cycle:
python1class MoeOrchestrator:2 """3 Orchestrates the mixture-of-experts RAG system.4 """56 def expansion_phase(self):7 """Expansion phase: Generate diverse outputs from active personas."""8 pass910 def evaluation_phase(self):11 """Evaluation phase: Score and rank the generated outputs."""12 pass1314 def pruning_phase(self):15 """Pruning phase: Remove underperforming personas and promote high performers."""16 pass
Part 4: Evaluation and Adaptation
4.1 Scoring Framework
Multiple scoring criteria ensure comprehensive evaluation:
python1def score_relevance(output, query):2 """Score the relevance of an output to the input query."""3 return 0.045def score_consistency(output, reference_outputs):6 """Score the consistency of an output with reference outputs."""7 return 0.089def score_novelty(output, existing_outputs):10 """Score the novelty of an output compared to existing outputs."""11 return 0.01213def score_entity_grounding(output, entities):14 """Score how well the output is grounded in the provided entities."""15 return 0.0
4.2 Metrics and Aggregation
python1def calculate_average_score(scores):2 """Calculate the average of a list of scores."""3 return 0.045def calculate_weighted_score(scores, weights):6 """Calculate a weighted average of scores."""7 return 0.089def aggregate_persona_performance(persona_scores):10 """Aggregate performance metrics for a persona across multiple evaluations."""11 return {}
4.3 Persona Lifecycle Management
Personas evolve through performance-based transitions:
- Active: Currently participating in inference
- Stable: Proven performers, quick to activate
- Experimental: Newly created or modified, being tested
- Pruned: Underperforming, archived in tiered folders
python1def evaluate_pruning_thresholds(persona_performance, thresholds):2 """3 Threshold-based demotion: Personas below certain performance metrics4 are demoted from active to stable, stable to experimental, experimental to pruned.5 """6 return 'keep'78def move_persona_to_folder(persona_id, current_folder, target_folder):9 """10 Folder-based archival: Move personas to appropriate archival folders11 instead of deleting them.12 """13 pass
Part 5: Integration and Execution
5.1 Ollama Integration
Local LLM inference with persona context:
python1def synthesize_persona_context(persona_outputs, graph_context):2 """Synthesize context from multiple persona outputs and graph traversal."""3 return ""45def send_prompt_to_ollama(synthesized_context, query, ollama_client):6 """Send the final prompt to the local Ollama model."""7 return ""
5.2 Storage and Persistence
Robust persistence for personas and graphs:
python1def load_persona_from_file(filepath):2 """Load a persona JSON file from disk."""3 return {}45def save_persona_to_file(persona_data, filepath):6 """Save persona data to a JSON file."""7 pass89def save_graph_snapshot(graph, query_id, timestamp):10 """Save a snapshot of the current graph state."""11 pass
5.3 Pipeline Execution
The main pipeline orchestrates all components:
python1def main():2 # 1. Input ingestion3 input_query = ""45 # 2. Entity construction6 entities = {}78 # 3. Graph creation9 graph = None1011 # 4. Persona traversal loop12 traversal_outputs = []1314 # 5. Scoring and pruning15 scores = []1617 # 6. Final Ollama inference18 final_response = ""
Part 6: Design Philosophy and Future Directions
6.1 Key Design Decisions
-
Query-Scoped Graphs: Graphs are built fresh for each query, ensuring relevance and preventing state pollution.
-
Persona Evolution: Personas accumulate metadata over time, enabling performance-based adaptation.
-
Threshold-Based Pruning: Mathematical thresholds provide deterministic, auditable persona management.
-
Local Inference: Ollama integration ensures privacy and reduces API dependencies.
-
Modular Architecture: Clear separation of concerns enables independent development and testing.
6.2 Implementation Roadmap
Phase 1: Core Infrastructure
- Complete basic graph operations
- Implement persona loading/saving
- Basic Ollama integration
Phase 2: Intelligence Layer
- Develop relevance evaluation algorithms
- Implement traversal heuristics
- Add sophisticated scoring metrics
Phase 3: Learning and Adaptation
- Performance-based persona evolution
- Dynamic threshold adjustment
- Graph optimization techniques
Phase 4: Production Readiness
- Comprehensive error handling
- Performance optimization
- Monitoring and logging
- API interfaces
6.3 Potential Extensions
- Multi-Modal Personas: Support for different input/output modalities
- Federated Learning: Distributed persona training across multiple systems
- Hierarchical Graphs: Multi-level graph representations for complex domains
- Real-Time Adaptation: Continuous learning during inference cycles
Conclusion
This dynamic persona MoE RAG system represents a sophisticated approach to AI orchestration, combining the strengths of specialized agents, flexible knowledge representation, and adaptive learning. By scaffolding the architecture through systematic, incremental development, we've created a foundation that can evolve into a powerful, context-aware AI system.
The modular design ensures that each component can be developed, tested, and improved independently while maintaining clear interfaces for integration. The emphasis on performance tracking, threshold-based adaptation, and local inference provides a robust framework for building reliable, adaptive AI applications.
As we move forward, the challenge will be balancing the complexity of multiple interacting components with the need for reliable, interpretable behavior. The systematic approach demonstrated here provides a blueprint for tackling these challenges in complex AI system development.

Sovereign AI: Building Local-First Intelligent Systems
by Daniel Kliewer · Paperback · 72 pages
The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.