Architectural Synthesis: Integrating OpenAI's Agents SDK with Ollama

A Convergence of Contemporary AI Paradigms

In the evolving landscape of artificial intelligence systems, the architectural integration of OpenAI's Agents SDK with Ollama represents a sophisticated approach to creating hybrid, responsive computational entities. This synthesis enables a dialectical interaction between cloud-based intelligence and local computational resources, creating what might be conceptualized as a Modern Computational Paradigm (MCP) system.

Theoretical Framework and Architectural Considerations

The foundational architecture of this integration leverages the strengths of both paradigms: OpenAI's Agents SDK provides a structured framework for creating autonomous agents capable of orchestrating complex, multi-step reasoning processes, while Ollama offers localized execution of large language models with reduced latency and enhanced privacy guarantees.

At its epistemological core, this architecture addresses the fundamental tension between computational capability and data sovereignty. The implementation creates a fluid boundary between local and remote processing, determined by contextual parameters including:

Computational complexity thresholds
Privacy requirements of specific data domains
Latency tolerance for particular interaction modalities
Economic considerations regarding API utilization

Functional Capabilities and Implementation Vectors

This architectural synthesis manifests several advanced capabilities:

Cognitive Load Distribution: The system intelligently routes cognitive tasks between local and remote execution environments based on complexity, resource requirements, and privacy constraints.
Tool Integration Framework: Both OpenAI's agents and Ollama instances can leverage a unified tool ecosystem, allowing for consistent interaction patterns with external systems.
Conversational State Management: A sophisticated state management system maintains coherent interaction context across the distributed computational environment.
Fallback Mechanisms: The architecture implements graceful degradation pathways, ensuring functionality persistence when either component faces constraints.

Implementation Methodology

The GitHub repository (kliewerdaniel/OpenAIAgentsSDKOllama01) provides the foundational code structure for this integration. The implementation follows a modular approach that encapsulates:

Abstraction layers for model interactions
Contextual routing logic
Unified response formatting
Configurable threshold parameters for decision boundaries

Theoretical Implications and Future Directions

This architectural approach represents a significant advancement in distributed AI systems theory. By creating a harmonious integration of cloud and edge AI capabilities, it establishes a framework for future systems that may further blur the boundaries between computational environments.

The integration opens avenues for research in several domains:

Optimal decision boundaries for computational routing
Privacy-preserving techniques for sensitive information processing
Economic models for hybrid AI systems
Cognitive load balancing algorithms

Conclusion

The integration of OpenAI's Agents SDK with Ollama represents not merely a technical implementation but a philosophical statement about the future of AI architectures. It suggests a path toward systems that transcend binary distinctions between local and remote, private and shared, efficient and powerful—instead creating a nuanced computational environment that adapts to the specific needs of each interaction context.

This approach invites further exploration and refinement, as the field continues to evolve toward increasingly sophisticated hybrid AI architectures that balance capability, privacy, efficiency, and cost.

Technical Infrastructure: Establishing the Development Environment for OpenAI-Ollama Integration

Foundational Dependencies and Technological Requisites

The implementation of a sophisticated hybrid AI architecture integrating OpenAI's Agents SDK with Ollama necessitates a carefully curated technological stack. This infrastructure must accommodate both cloud-based intelligence and local inference capabilities within a coherent framework.

Core Dependencies

Python Environment

Python 3.10+ (3.11 recommended for optimal performance characteristics)

Essential Python Packages

openai>=1.12.0          # Provides Agents SDK capabilities
ollama>=0.1.6           # Python client for Ollama interaction
fastapi>=0.109.0        # API framework for service endpoints
uvicorn>=0.27.0         # ASGI server implementation
pydantic>=2.5.0         # Data validation and settings management
python-dotenv>=1.0.0    # Environment variable management
requests>=2.31.0        # HTTP requests for external service interaction
websockets>=12.0        # WebSocket support for real-time communication
tenacity>=8.2.3         # Retry logic for resilient API interactions

External Services

OpenAI API access (API key required)
Ollama (local installation)

Environment Configuration

Installation Procedure

Python Environment Initialization

Bash
# Create isolated environment
python -m venv venv

# Activate environment
# On Unix/macOS:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Dependency Installation

Bash
pip install openai ollama fastapi uvicorn pydantic python-dotenv requests websockets tenacity

Ollama Installation

Bash
# macOS (using Homebrew)
brew install ollama

# Linux (using curl)
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download/windows

Model Initialization for Ollama

Bash
# Pull high-performance local model (e.g., Llama2)
ollama pull llama2

# Optional: Pull additional specialized models
ollama pull mistral
ollama pull codellama

Environment Configuration

Create a .env file in the project root with the following parameters:

# OpenAI Configuration
OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...  # Optional

# Model Configuration
OPENAI_MODEL=gpt-4o
OLLAMA_MODEL=llama2
OLLAMA_HOST=http://localhost:11434

# System Behavior
TEMPERATURE=0.7
MAX_TOKENS=4096
REQUEST_TIMEOUT=120

# Routing Configuration
COMPLEXITY_THRESHOLD=0.65
PRIVACY_SENSITIVE_TOKENS=["password", "secret", "token", "key", "credential"]

# Logging Configuration
LOG_LEVEL=INFO

Development Environment Setup

Repository Initialization

Bash
git clone https://github.com/kliewerdaniel/OpenAIAgentsSDKOllama01.git
cd OpenAIAgentsSDKOllama01

Project Structure Implementation

Bash
mkdir -p app/core app/models app/routers app/services app/utils tests
touch app/__init__.py app/core/__init__.py app/models/__init__.py app/routers/__init__.py app/services/__init__.py app/utils/__init__.py

Local Development Server

Bash
# Start Ollama service
ollama serve

# In a separate terminal, start the application
uvicorn app.main:app --reload

Containerization (Optional)

For reproducible environments and deployment consistency:

Dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

With Docker Compose integration for Ollama:

YAML
# docker-compose.yml
version: '3.8'

services:
  app:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    volumes:
      - .:/app
      
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

Verification of Installation

To validate the environment configuration:

Bash
python -c "import openai; import ollama; print('OpenAI SDK Version:', openai.__version__); print('Ollama Client Version:', ollama.__version__)"

To test Ollama connectivity:

Bash
python -c "import ollama; print(ollama.list())"

To test OpenAI API connectivity:

Bash
python -c "import openai; import os; from dotenv import load_dotenv; load_dotenv(); client = openai.OpenAI(); print(client.models.list())"

This comprehensive environment setup establishes the foundation for a sophisticated hybrid AI system that leverages both cloud-based intelligence and local inference capabilities. The configuration allows for flexible routing of requests based on privacy considerations, computational complexity, and performance requirements.

Integration Architecture: OpenAI Responses API within the MCP Framework

Theoretical Framework for API Integration

The integration of OpenAI's Responses API within our Modern Computational Paradigm (MCP) framework represents a sophisticated exercise in distributed intelligence architecture. This document delineates the structural components, interface definitions, and operational parameters for establishing a cohesive integration that leverages both cloud-based and local inference capabilities.

API Architectural Design

Core Endpoints Structure

The system exposes a carefully designed set of endpoints that abstract the underlying complexity of model routing and response generation:

/api/v1
├── /chat
│   ├── POST /completions       # Primary conversational interface
│   ├── POST /streaming         # Event-stream response generation
│   └── POST /hybrid            # Intelligent routing between OpenAI and Ollama
├── /tools
│   ├── POST /execute           # Tool execution framework
│   └── GET /available          # Tool discovery mechanism
├── /agents
│   ├── POST /run               # Agent execution with Agents SDK
│   ├── GET /status/{run_id}    # Asynchronous execution status
│   └── POST /cancel/{run_id}   # Execution termination
└── /system
    ├── GET /health             # Service health verification
    ├── GET /models             # Available model enumeration
    └── POST /config            # Runtime configuration adjustment

Request/Response Schemata

Primary Chat Interface

JSON
// POST /api/v1/chat/completions
// Request
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing."}
  ],
  "model": "auto",  // "auto", "openai:", or "ollama:"
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false,
  "routing_preferences": {
    "force_provider": null,  // null, "openai", "ollama"
    "privacy_level": "standard",  // "standard", "high", "max"
    "latency_preference": "balanced"  // "speed", "balanced", "quality"
  },
  "tools": [...]  // Optional tool definitions
}

// Response
{
  "id": "resp_abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "provider": "openai",  // The actual provider used
  "model": "gpt-4o",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 325,
    "total_tokens": 381
  },
  "message": {
    "role": "assistant",
    "content": "Quantum computing is...",
    "tool_calls": []  // Optional tool calls if requested
  },
  "routing_metrics": {
    "complexity_score": 0.78,
    "privacy_impact": "low",
    "decision_factors": ["complexity", "tool_requirements"]
  }
}

Agent Execution Interface

JSON
// POST /api/v1/agents/run
// Request
{
  "agent_config": {
    "instructions": "You are a research assistant. Help the user find information about recent AI developments.",
    "model": "gpt-4o",
    "tools": [
      // Tool definitions following OpenAI's format
    ]
  },
  "messages": [
    {"role": "user", "content": "Find recent papers on transformer efficiency."}
  ],
  "metadata": {
    "session_id": "user_session_abc123",
    "locale": "en-US"
  }
}

// Response
{
  "run_id": "run_def456",
  "status": "in_progress",
  "created_at": 1677858242,
  "estimated_completion_time": 1677858260,
  "polling_url": "/api/v1/agents/status/run_def456"
}

Authentication & Security Framework

Authentication Mechanisms

The system implements a layered authentication approach:

API Key Authentication
```
Authorization: Bearer {api_key}
```

OpenAI Credential Management

Server-side credential storage with encryption at rest
Optional client-provided credentials per request

JSON
// Optional credential override
{
  "auth_override": {
    "openai_api_key": "sk_...",
    "openai_org_id": "org-..."
  }
}

Session-Based Authentication (Web Interface)
- JWT-based authentication with refresh token rotation
- PKCE flow for authorization code exchanges

Security Considerations

TLS 1.3 required for all communications
Request signing for high-security deployments
Content-Security-Policy headers to prevent XSS
Rate limiting by user/IP with exponential backoff

Error Handling Architecture

The system implements a comprehensive error handling framework:

JSON
// Error Response Structure
{
  "error": {
    "code": "provider_error",
    "message": "OpenAI API returned an error",
    "details": {
      "provider": "openai",
      "status_code": 429,
      "original_message": "Rate limit exceeded",
      "request_id": "req_ghi789"
    },
    "remediation": {
      "retry_after": 30,
      "alternatives": ["switch_provider", "reduce_complexity"],
      "fallback_available": true
    }
  }
}

Error Categories

Provider Errors (provider_error)
- OpenAI API failures
- Ollama execution failures
- Network connectivity issues
Input Validation Errors (validation_error)
- Schema validation failures
- Content policy violations
- Size limit exceedances
System Errors (system_error)
- Resource exhaustion
- Internal component failures
- Dependency service outages
Authentication Errors (auth_error)
- Invalid credentials
- Expired tokens
- Insufficient permissions

Rate Limiting Architecture

The system implements a sophisticated rate limiting structure:

Tiered Rate Limiting

Standard tier:
  - 10 requests/minute
  - 100 requests/hour
  - 1000 requests/day

Premium tier:
  - 60 requests/minute
  - 1000 requests/hour
  - 10000 requests/day

Dynamic Rate Adjustment

Token bucket algorithm with dynamic refill rates
Separate buckets for different endpoint categories
Priority-based token distribution

Rate Limit Response

JSON
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "You have exceeded the rate limit",
    "details": {
      "rate_limit": {
        "tier": "standard",
        "limit": "10 per minute",
        "remaining": 0,
        "reset_at": "2023-03-01T12:35:00Z",
        "retry_after": 25
      },
      "usage_statistics": {
        "current_minute": 11,
        "current_hour": 43,
        "current_day": 178
      }
    },
    "remediation": {
      "upgrade_url": "/account/upgrade",
      "alternatives": ["reduce_frequency", "batch_requests"]
    }
  }
}

Implementation Strategy

Provider Abstraction Layer

Python
# Pseudocode for the Provider Abstraction Layer
class ModelProvider(ABC):
    @abstractmethod
    async def generate_completion(self, messages, params):
        pass
        
    @abstractmethod
    async def stream_completion(self, messages, params):
        pass
    
    @classmethod
    def get_provider(cls, provider_name, model_id):
        if provider_name == "openai":
            return OpenAIProvider(model_id)
        elif provider_name == "ollama":
            return OllamaProvider(model_id)
        else:
            return AutoRoutingProvider()

Intelligent Routing Decision Engine

Python
# Pseudocode for Routing Logic
class RoutingEngine:
    def __init__(self, config):
        self.config = config
        
    async def determine_route(self, request):
        # Analyze request complexity
        complexity = self._analyze_complexity(request.messages)
        
        # Check for privacy constraints
        privacy_impact = self._assess_privacy_impact(request.messages)
        
        # Consider tool requirements
        tools_compatible = self._check_tool_compatibility(
            request.tools, available_providers)
            
        # Make routing decision
        if request.routing_preferences.force_provider:
            return request.routing_preferences.force_provider
            
        if privacy_impact == "high" and self.config.privacy_first:
            return "ollama"
            
        if complexity > self.config.complexity_threshold:
            return "openai"
            
        # Default routing logic
        return "ollama" if self.config.prefer_local else "openai"

Authentication Implementation

Python
# Middleware for API Key Authentication
async def api_key_middleware(request, call_next):
    api_key = request.headers.get("Authorization")
    
    if not api_key or not api_key.startswith("Bearer "):
        return JSONResponse(
            status_code=401,
            content={"error": {
                "code": "auth_error",
                "message": "Missing or invalid API key"
            }}
        )
    
    # Extract and validate token
    token = api_key.replace("Bearer ", "")
    user = await validate_api_key(token)
    
    if not user:
        return JSONResponse(
            status_code=401,
            content={"error": {
                "code": "auth_error",
                "message": "Invalid API key"
            }}
        )
    
    # Attach user to request state
    request.state.user = user
    return await call_next(request)

Rate Limiting Implementation

Python
# Rate Limiter Implementation
class RateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
        
    async def check_rate_limit(self, user_id, endpoint_category):
        # Generate Redis keys for different time windows
        minute_key = f"rate:user:{user_id}:{endpoint_category}:minute"
        hour_key = f"rate:user:{user_id}:{endpoint_category}:hour"
        
        # Get user tier and corresponding limits
        user_tier = await self._get_user_tier(user_id)
        tier_limits = TIER_LIMITS[user_tier]
        
        # Check limits for each window
        pipe = self.redis.pipeline()
        pipe.incr(minute_key)
        pipe.expire(minute_key, 60)
        pipe.incr(hour_key)
        pipe.expire(hour_key, 3600)
        results = await pipe.execute()
        
        minute_count, _, hour_count, _ = results
        
        # Check if limits are exceeded
        if minute_count > tier_limits["per_minute"]:
            return {
                "allowed": False,
                "window": "minute",
                "limit": tier_limits["per_minute"],
                "current": minute_count,
                "retry_after": self._calculate_retry_after(minute_key)
            }
            
        if hour_count > tier_limits["per_hour"]:
            return {
                "allowed": False,
                "window": "hour",
                "limit": tier_limits["per_hour"],
                "current": hour_count,
                "retry_after": self._calculate_retry_after(hour_key)
            }
            
        return {"allowed": True}
        
    async def _calculate_retry_after(self, key):
        ttl = await self.redis.ttl(key)
        return max(1, ttl)

Operational Considerations

Monitoring and Observability
- Structured logging with correlation IDs
- Prometheus metrics for request routing decisions
- Tracing with OpenTelemetry
Fallback Mechanisms
- Circuit breaker pattern for provider failures
- Graceful degradation to simpler models
- Response caching for common queries
Deployment Strategy
- Containerized deployment with Kubernetes
- Blue/green deployment for zero-downtime updates
- Regional deployment for latency optimization

Conclusion

This integration architecture establishes a robust framework for leveraging both OpenAI's cloud capabilities and Ollama's local inference within a unified system. The design emphasizes flexibility, security, and resilience while providing sophisticated routing logic to optimize for different operational parameters including cost, privacy, and performance.

The implementation allows for progressive enhancement as requirements evolve, with clear extension points for additional providers, tools, and routing strategies.

Autonomous Agent Architecture: Python Implementations for MCP Integration

Theoretical Framework for Agent Design

This collection of Python implementations establishes a comprehensive agent architecture leveraging the Modern Computational Paradigm (MCP) system. The design emphasizes cognitive capabilities including knowledge retrieval, conversation flow management, and contextual awareness through a modular approach to agent construction.

Core Agent Infrastructure

Base Agent Class

Python
# app/agents/base_agent.py
from abc import ABC, abstractmethod
from typing import Dict, List, Any, Optional
import uuid
import logging
from pydantic import BaseModel, Field

from app.services.provider_service import ProviderService
from app.models.message import Message, MessageRole
from app.models.tool import Tool

logger = logging.getLogger(__name__)

class AgentState(BaseModel):
    """Represents the internal state of an agent."""
    conversation_history: List[Message] = Field(default_factory=list)
    memory: Dict[str, Any] = Field(default_factory=dict)
    context: Dict[str, Any] = Field(default_factory=dict)
    metadata: Dict[str, Any] = Field(default_factory=dict)
    session_id: str = Field(default_factory=lambda: str(uuid.uuid4()))

class BaseAgent(ABC):
    """Abstract base class for all agents in the system."""
    
    def __init__(
        self,
        provider_service: ProviderService,
        system_prompt: str,
        tools: Optional[List[Tool]] = None,
        state: Optional[AgentState] = None
    ):
        self.provider_service = provider_service
        self.system_prompt = system_prompt
        self.tools = tools or []
        self.state = state or AgentState()
        
        # Initialize conversation with system prompt
        self._initialize_conversation()
    
    def _initialize_conversation(self):
        """Initialize the conversation history with the system prompt."""
        self.state.conversation_history.append(
            Message(role=MessageRole.SYSTEM, content=self.system_prompt)
        )
    
    async def process_message(self, message: str, user_id: str) -> str:
        """Process a user message and return a response."""
        # Add user message to conversation history
        user_message = Message(role=MessageRole.USER, content=message)
        self.state.conversation_history.append(user_message)
        
        # Process the message and generate a response
        response = await self._generate_response(user_id)
        
        # Add assistant response to conversation history
        assistant_message = Message(role=MessageRole.ASSISTANT, content=response)
        self.state.conversation_history.append(assistant_message)
        
        return response
    
    @abstractmethod
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response based on the conversation history."""
        pass
    
    async def add_context(self, key: str, value: Any):
        """Add contextual information to the agent's state."""
        self.state.context[key] = value
        
    def get_conversation_history(self) -> List[Message]:
        """Return the conversation history."""
        return self.state.conversation_history
    
    def clear_conversation(self, keep_system_prompt: bool = True):
        """Clear the conversation history."""
        if keep_system_prompt and self.state.conversation_history:
            system_messages = [
                msg for msg in self.state.conversation_history 
                if msg.role == MessageRole.SYSTEM
            ]
            self.state.conversation_history = system_messages
        else:
            self.state.conversation_history = []
            self._initialize_conversation()

Specialized Agent Implementations

Research Agent with Knowledge Retrieval

Python
# app/agents/research_agent.py
from typing import List, Dict, Any, Optional
import logging

from app.agents.base_agent import BaseAgent
from app.services.knowledge_service import KnowledgeService
from app.models.message import Message, MessageRole
from app.models.tool import Tool

logger = logging.getLogger(__name__)

class ResearchAgent(BaseAgent):
    """Agent specialized for research tasks with knowledge retrieval capabilities."""
    
    def __init__(self, *args, knowledge_service: KnowledgeService, **kwargs):
        super().__init__(*args, **kwargs)
        self.knowledge_service = knowledge_service
        
        # Register knowledge retrieval tools
        self.tools.extend([
            Tool(
                name="search_knowledge_base",
                description="Search the knowledge base for relevant information",
                parameters={
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "The search query"
                        },
                        "max_results": {
                            "type": "integer",
                            "description": "Maximum number of results to return",
                            "default": 3
                        }
                    },
                    "required": ["query"]
                }
            ),
            Tool(
                name="retrieve_document",
                description="Retrieve a specific document by ID",
                parameters={
                    "type": "object",
                    "properties": {
                        "document_id": {
                            "type": "string",
                            "description": "The ID of the document to retrieve"
                        }
                    },
                    "required": ["document_id"]
                }
            )
        ])
    
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response with knowledge augmentation."""
        # Extract the last user message
        last_user_message = next(
            (msg for msg in reversed(self.state.conversation_history) 
             if msg.role == MessageRole.USER), 
            None
        )
        
        if not last_user_message:
            return "I don't have any messages to respond to."
        
        # Perform knowledge retrieval to augment the response
        relevant_information = await self._retrieve_relevant_knowledge(last_user_message.content)
        
        # Add retrieved information to context
        if relevant_information:
            context_message = Message(
                role=MessageRole.SYSTEM,
                content=f"Relevant information: {relevant_information}"
            )
            augmented_history = self.state.conversation_history.copy()
            augmented_history.insert(-1, context_message)
        else:
            augmented_history = self.state.conversation_history
        
        # Generate response using the provider service
        response = await self.provider_service.generate_completion(
            messages=[msg.model_dump() for msg in augmented_history],
            tools=self.tools,
            user=user_id
        )
        
        # Process tool calls if any
        if response.get("tool_calls"):
            tool_responses = await self._process_tool_calls(response["tool_calls"])
            
            # Add tool responses to conversation history
            for tool_response in tool_responses:
                self.state.conversation_history.append(
                    Message(
                        role=MessageRole.TOOL,
                        content=tool_response["content"],
                        tool_call_id=tool_response["tool_call_id"]
                    )
                )
            
            # Generate a new response with tool results
            final_response = await self.provider_service.generate_completion(
                messages=[msg.model_dump() for msg in self.state.conversation_history],
                tools=self.tools,
                user=user_id
            )
            return final_response["message"]["content"]
        
        return response["message"]["content"]
    
    async def _retrieve_relevant_knowledge(self, query: str) -> Optional[str]:
        """Retrieve relevant information from knowledge base."""
        try:
            results = await self.knowledge_service.search(query, max_results=3)
            
            if not results:
                return None
                
            # Format the results
            formatted_results = "\n\n".join([
                f"Source: {result['title']}\n"
                f"Content: {result['content']}\n"
                f"Relevance: {result['relevance_score']}"
                for result in results
            ])
            
            return formatted_results
        except Exception as e:
            logger.error(f"Error retrieving knowledge: {str(e)}")
            return None
    
    async def _process_tool_calls(self, tool_calls: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Process tool calls and return tool responses."""
        tool_responses = []
        
        for tool_call in tool_calls:
            tool_name = tool_call["function"]["name"]
            tool_args = tool_call["function"]["arguments"]
            tool_call_id = tool_call["id"]
            
            try:
                if tool_name == "search_knowledge_base":
                    results = await self.knowledge_service.search(
                        query=tool_args["query"],
                        max_results=tool_args.get("max_results", 3)
                    )
                    formatted_results = "\n\n".join([
                        f"Document ID: {result['id']}\n"
                        f"Title: {result['title']}\n"
                        f"Summary: {result['summary']}"
                        for result in results
                    ])
                    
                    tool_responses.append({
                        "tool_call_id": tool_call_id,
                        "content": formatted_results or "No results found."
                    })
                    
                elif tool_name == "retrieve_document":
                    document = await self.knowledge_service.retrieve_document(
                        document_id=tool_args["document_id"]
                    )
                    
                    if document:
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": f"Title: {document['title']}\n\n{document['content']}"
                        })
                    else:
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": "Document not found."
                        })
            except Exception as e:
                logger.error(f"Error processing tool call {tool_name}: {str(e)}")
                tool_responses.append({
                    "tool_call_id": tool_call_id,
                    "content": f"Error processing tool call: {str(e)}"
                })
        
        return tool_responses

Conversational Flow Manager Agent

Python
# app/agents/conversation_manager.py
from typing import Dict, List, Any, Optional
import logging
import json

from app.agents.base_agent import BaseAgent
from app.models.message import Message, MessageRole

logger = logging.getLogger(__name__)

class ConversationState(BaseModel):
    """Tracks the state of a conversation."""
    current_topic: Optional[str] = None
    topic_history: List[str] = Field(default_factory=list)
    user_preferences: Dict[str, Any] = Field(default_factory=dict)
    conversation_stage: str = "opening"  # opening, exploring, focusing, concluding
    open_questions: List[str] = Field(default_factory=list)
    satisfaction_score: Optional[float] = None

class ConversationManager(BaseAgent):
    """Agent specialized in managing conversation flow and context."""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.conversation_state = ConversationState()
        
        # Register conversation management tools
        self.tools.extend([
            {
                "type": "function",
                "function": {
                    "name": "update_conversation_state",
                    "description": "Update the state of the conversation based on analysis",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "current_topic": {
                                "type": "string",
                                "description": "The current topic of conversation"
                            },
                            "conversation_stage": {
                                "type": "string",
                                "description": "The current stage of the conversation",
                                "enum": ["opening", "exploring", "focusing", "concluding"]
                            },
                            "detected_preferences": {
                                "type": "object",
                                "description": "Preferences detected from the user"
                            },
                            "open_questions": {
                                "type": "array",
                                "items": {"type": "string"},
                                "description": "Questions that remain unanswered"
                            },
                            "satisfaction_estimate": {
                                "type": "number",
                                "description": "Estimated user satisfaction (0-1)"
                            }
                        }
                    }
                }
            }
        ])
    
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response with conversation flow management."""
        # First, analyze the conversation to update state
        analysis_prompt = self._create_analysis_prompt()
        
        analysis_messages = [
            {"role": "system", "content": analysis_prompt},
            {"role": "user", "content": "Analyze the following conversation and update the conversation state."},
            {"role": "user", "content": self._format_conversation_history()}
        ]
        
        analysis_response = await self.provider_service.generate_completion(
            messages=analysis_messages,
            tools=self.tools,
            tool_choice={"type": "function", "function": {"name": "update_conversation_state"}},
            user=user_id
        )
        
        # Process conversation state update
        if analysis_response.get("tool_calls"):
            tool_call = analysis_response["tool_calls"][0]
            if tool_call["function"]["name"] == "update_conversation_state":
                try:
                    state_update = json.loads(tool_call["function"]["arguments"])
                    self._update_conversation_state(state_update)
                except Exception as e:
                    logger.error(f"Error updating conversation state: {str(e)}")
        
        # Now generate the actual response with enhanced context
        enhanced_messages = self.state.conversation_history.copy()
        
        # Add conversation state as context
        context_message = Message(
            role=MessageRole.SYSTEM,
            content=self._format_conversation_context()
        )
        enhanced_messages.insert(-1, context_message)
        
        response = await self.provider_service.generate_completion(
            messages=[msg.model_dump() for msg in enhanced_messages],
            user=user_id
        )
        
        return response["message"]["content"]
    
    def _create_analysis_prompt(self) -> str:
        """Create a prompt for conversation analysis."""
        return """
        You are a conversation analysis expert. Your task is to analyze the conversation 
        and extract key information about the current state of the dialogue. 
        
        Specifically, you should:
        1. Identify the current main topic of conversation
        2. Determine the stage of the conversation (opening, exploring, focusing, or concluding)
        3. Detect user preferences and interests from their messages
        4. Track open questions that haven't been fully addressed
        5. Estimate user satisfaction based on their engagement and responses
        
        Use the update_conversation_state function to provide this analysis.
        """
    
    def _format_conversation_history(self) -> str:
        """Format the conversation history for analysis."""
        formatted = []
        
        for msg in self.state.conversation_history:
            if msg.role == MessageRole.SYSTEM:
                continue
            formatted.append(f"{msg.role.value}: {msg.content}")
        
        return "\n\n".join(formatted)
    
    def _update_conversation_state(self, update: Dict[str, Any]):
        """Update the conversation state with analysis results."""
        if "current_topic" in update and update["current_topic"]:
            if self.conversation_state.current_topic != update["current_topic"]:
                if self.conversation_state.current_topic:
                    self.conversation_state.topic_history.append(
                        self.conversation_state.current_topic
                    )
                self.conversation_state.current_topic = update["current_topic"]
        
        if "conversation_stage" in update:
            self.conversation_state.conversation_stage = update["conversation_stage"]
        
        if "detected_preferences" in update:
            for key, value in update["detected_preferences"].items():
                self.conversation_state.user_preferences[key] = value
        
        if "open_questions" in update:
            self.conversation_state.open_questions = update["open_questions"]
        
        if "satisfaction_estimate" in update:
            self.conversation_state.satisfaction_score = update["satisfaction_estimate"]
    
    def _format_conversation_context(self) -> str:
        """Format the conversation state as context for response generation."""
        return f"""
        Current conversation context:
        - Topic: {self.conversation_state.current_topic or 'Not yet established'}
        - Conversation stage: {self.conversation_state.conversation_stage}
        - User preferences: {json.dumps(self.conversation_state.user_preferences, indent=2)}
        - Open questions: {', '.join(self.conversation_state.open_questions) if self.conversation_state.open_questions else 'None'}
        
        Previous topics: {', '.join(self.conversation_state.topic_history) if self.conversation_state.topic_history else 'None'}
        
        Adapt your response to this conversation context. If in exploring stage, ask open-ended questions.
        If in focusing stage, provide detailed information on the current topic. If in concluding stage,
        summarize key points and check if the user needs anything else.
        """

Memory-Enhanced Contextual Agent

Python
# app/agents/contextual_agent.py
from typing import List, Dict, Any, Optional, Tuple
import logging
import time
from datetime import datetime

from app.agents.base_agent import BaseAgent
from app.services.memory_service import MemoryService
from app.models.message import Message, MessageRole

logger = logging.getLogger(__name__)

class ContextualAgent(BaseAgent):
    """Agent with enhanced contextual awareness and memory capabilities."""
    
    def __init__(self, *args, memory_service: MemoryService, **kwargs):
        super().__init__(*args, **kwargs)
        self.memory_service = memory_service
        
        # Initialize memory collections
        self.episodic_memory = []  # Stores specific interactions/events
        self.semantic_memory = {}  # Stores facts and knowledge
        self.working_memory = []   # Currently active context
        
        self.max_working_memory = 10  # Max items in working memory
    
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response with contextual memory enhancement."""
        # Update memories based on recent conversation
        await self._update_memories(user_id)
        
        # Retrieve relevant memories for current context
        relevant_memories = await self._retrieve_relevant_memories(user_id)
        
        # Create context-enhanced prompt
        context_message = Message(
            role=MessageRole.SYSTEM,
            content=self._create_context_prompt(relevant_memories)
        )
        
        # Insert context before the last user message
        enhanced_history = self.state.conversation_history.copy()
        user_message_index = next(
            (i for i, msg in enumerate(reversed(enhanced_history)) 
             if msg.role == MessageRole.USER),
            None
        )
        if user_message_index is not None:
            user_message_index = len(enhanced_history) - 1 - user_message_index
            enhanced_history.insert(user_message_index, context_message)
        
        # Generate response
        response = await self.provider_service.generate_completion(
            messages=[msg.model_dump() for msg in enhanced_history],
            tools=self.tools,
            user=user_id
        )
        
        # Process memory-related tool calls if any
        if response.get("tool_calls"):
            memory_updates = await self._process_memory_tools(response["tool_calls"])
            if memory_updates:
                # If memory was updated, we might want to regenerate with new context
                return await self._generate_response(user_id)
        
        # Update working memory with the response
        if response["message"]["content"]:
            self.working_memory.append({
                "type": "assistant_response",
                "content": response["message"]["content"],
                "timestamp": time.time()
            })
            self._prune_working_memory()
        
        return response["message"]["content"]
    
    async def _update_memories(self, user_id: str):
        """Update the agent's memories based on recent conversation."""
        # Get last user message
        last_user_message = next(
            (msg for msg in reversed(self.state.conversation_history) 
             if msg.role == MessageRole.USER),
            None
        )
        
        if not last_user_message:
            return
        
        # Add to working memory
        self.working_memory.append({
            "type": "user_message",
            "content": last_user_message.content,
            "timestamp": time.time()
        })
        
        # Extract potential semantic memories (facts, preferences)
        if len(self.state.conversation_history) > 2:
            extraction_messages = [
                {"role": "system", "content": "Extract key facts, preferences, or personal details from this user message that would be useful to remember for future interactions. Return in JSON format with keys: 'facts', 'preferences', 'personal_details', each containing an array of strings."},
                {"role": "user", "content": last_user_message.content}
            ]
            
            try:
                extraction = await self.provider_service.generate_completion(
                    messages=extraction_messages,
                    user=user_id,
                    response_format={"type": "json_object"}
                )
                
                content = extraction["message"]["content"]
                if content:
                    import json
                    memory_data = json.loads(content)
                    
                    # Store in semantic memory
                    timestamp = datetime.now().isoformat()
                    for category, items in memory_data.items():
                        if not isinstance(items, list):
                            continue
                        for item in items:
                            if not item or not isinstance(item, str):
                                continue
                            memory_key = f"{category}:{self._generate_memory_key(item)}"
                            self.semantic_memory[memory_key] = {
                                "content": item,
                                "category": category,
                                "last_accessed": timestamp,
                                "created_at": timestamp,
                                "importance": self._calculate_importance(item)
                            }
                    
                    # Store in memory service for persistence
                    await self.memory_service.store_memories(
                        user_id=user_id,
                        memories=self.semantic_memory
                    )
            except Exception as e:
                logger.error(f"Error extracting memories: {str(e)}")
        
        # Prune working memory if needed
        self._prune_working_memory()
    
    async def _retrieve_relevant_memories(self, user_id: str) -> Dict[str, List[Any]]:
        """Retrieve memories relevant to the current context."""
        # Get conversation summary or last few messages
        if len(self.state.conversation_history) <= 2:
            query = self.state.conversation_history[-1].content
        else:
            recent_messages = self.state.conversation_history[-3:]
            query = " ".join([msg.content for msg in recent_messages if msg.role != MessageRole.SYSTEM])
        
        # Retrieve from memory service
        stored_memories = await self.memory_service.retrieve_memories(
            user_id=user_id,
            query=query,
            limit=5
        )
        
        # Combine with local semantic memory
        all_memories = {
            "facts": [],
            "preferences": [],
            "personal_details": [],
            "episodic": self.episodic_memory[-3:] if self.episodic_memory else []
        }
        
        # Add from semantic memory
        for key, memory in self.semantic_memory.items():
            category = memory["category"]
            if category in all_memories and len(all_memories[category]) < 5:
                all_memories[category].append(memory["content"])
        
        # Add from stored memories
        for memory in stored_memories:
            category = memory.get("category", "facts")
            if category in all_memories and len(all_memories[category]) < 5:
                all_memories[category].append(memory["content"])
                
                # Update last accessed
                if memory.get("id"):
                    memory_key = f"{category}:{memory['id']}"
                    if memory_key in self.semantic_memory:
                        self.semantic_memory[memory_key]["last_accessed"] = datetime.now().isoformat()
        
        return all_memories
    
    def _create_context_prompt(self, memories: Dict[str, List[Any]]) -> str:
        """Create a context prompt with relevant memories."""
        context_parts = ["Additional context to consider:"]
        
        if memories["facts"]:
            facts = "\n".join([f"- {fact}" for fact in memories["facts"]])
            context_parts.append(f"Facts about the user or relevant topics:\n{facts}")
        
        if memories["preferences"]:
            prefs = "\n".join([f"- {pref}" for pref in memories["preferences"]])
            context_parts.append(f"User preferences:\n{prefs}")
        
        if memories["personal_details"]:
            details = "\n".join([f"- {detail}" for detail in memories["personal_details"]])
            context_parts.append(f"Personal details:\n{details}")
        
        if memories["episodic"]:
            episodes = "\n".join([f"- {ep.get('summary', '')}" for ep in memories["episodic"]])
            context_parts.append(f"Recent interactions:\n{episodes}")
        
        # Add working memory summary
        if self.working_memory:
            working_context = "Current context:\n"
            for item in self.working_memory[-5:]:
                item_type = item["type"]
                content_preview = item["content"][:100] + "..." if len(item["content"]) > 100 else item["content"]
                working_context += f"- [{item_type}] {content_preview}\n"
            context_parts.append(working_context)
        
        context_parts.append("Use this information to personalize your response, but don't explicitly mention that you're using saved information unless directly relevant.")
        
        return "\n\n".join(context_parts)
    
    def _prune_working_memory(self):
        """Prune working memory to stay within limits."""
        if len(self.working_memory) > self.max_working_memory:
            # Instead of simple truncation, we prioritize by recency and importance
            self.working_memory.sort(key=lambda x: (x.get("importance", 0.5), x["timestamp"]), reverse=True)
            self.working_memory = self.working_memory[:self.max_working_memory]
    
    def _generate_memory_key(self, content: str) -> str:
        """Generate a unique key for memory storage."""
        import hashlib
        return hashlib.md5(content.encode()).hexdigest()[:10]
    
    def _calculate_importance(self, content: str) -> float:
        """Calculate the importance score of a memory item."""
        # Simple heuristic based on content length and presence of certain keywords
        importance_keywords = ["always", "never", "hate", "love", "favorite", "important", "must", "need"]
        
        base_score = min(len(content) / 100, 0.5)  # Longer items get higher base score, up to 0.5
        
        keyword_score = sum(0.1 for word in importance_keywords if word in content.lower()) 
        keyword_score = min(keyword_score, 0.5)  # Cap at 0.5
        
        return base_score + keyword_score
    
    async def _process_memory_tools(self, tool_calls: List[Dict[str, Any]]) -> bool:
        """Process memory-related tool calls."""
        # Implement if we add memory-specific tools
        return False

Advanced Tool Integration

Collaborative Task Management Agent

Python
# app/agents/task_agent.py
from typing import List, Dict, Any, Optional
import logging
import json
import asyncio

from app.agents.base_agent import BaseAgent
from app.models.message import Message, MessageRole
from app.models.tool import Tool
from app.services.task_service import TaskService

logger = logging.getLogger(__name__)

class TaskManagementAgent(BaseAgent):
    """Agent specialized in collaborative task management."""
    
    def __init__(self, *args, task_service: TaskService, **kwargs):
        super().__init__(*args, **kwargs)
        self.task_service = task_service
        
        # Register task management tools
        self.tools.extend([
            Tool(
                name="list_tasks",
                description="List tasks for the user",
                parameters={
                    "type": "object",
                    "properties": {
                        "status": {
                            "type": "string",
                            "enum": ["pending", "in_progress", "completed", "all"],
                            "description": "Filter tasks by status"
                        },
                        "limit": {
                            "type": "integer",
                            "description": "Maximum number of tasks to return",
                            "default": 10
                        }
                    }
                }
            ),
            Tool(
                name="create_task",
                description="Create a new task",
                parameters={
                    "type": "object",
                    "properties": {
                        "title": {
                            "type": "string",
                            "description": "Title of the task"
                        },
                        "description": {
                            "type": "string",
                            "description": "Detailed description of the task"
                        },
                        "due_date": {
                            "type": "string",
                            "description": "Due date in ISO format (YYYY-MM-DD)"
                        },
                        "priority": {
                            "type": "string",
                            "enum": ["low", "medium", "high"],
                            "description": "Priority level of the task"
                        }
                    },
                    "required": ["title"]
                }
            ),
            Tool(
                name="update_task",
                description="Update an existing task",
                parameters={
                    "type": "object",
                    "properties": {
                        "task_id": {
                            "type": "string",
                            "description": "ID of the task to update"
                        },
                        "title": {
                            "type": "string",
                            "description": "New title of the task"
                        },
                        "description": {
                            "type": "string",
                            "description": "New description of the task"
                        },
                        "status": {
                            "type": "string",
                            "enum": ["pending", "in_progress", "completed"],
                            "description": "New status of the task"
                        },
                        "due_date": {
                            "type": "string",
                            "description": "New due date in ISO format (YYYY-MM-DD)"
                        },
                        "priority": {
                            "type": "string",
                            "enum": ["low", "medium", "high"],
                            "description": "New priority level of the task"
                        }
                    },
                    "required": ["task_id"]
                }
            ),
            Tool(
                name="delete_task",
                description="Delete a task",
                parameters={
                    "type": "object",
                    "properties": {
                        "task_id": {
                            "type": "string",
                            "description": "ID of the task to delete"
                        },
                        "confirm": {
                            "type": "boolean",
                            "description": "Confirmation to delete the task",
                            "default": False
                        }
                    },
                    "required": ["task_id", "confirm"]
                }
            )
        ])
    
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response with task management capabilities."""
        # Prepare messages for completion
        messages = [msg.model_dump() for msg in self.state.conversation_history]
        
        # Generate initial response
        response = await self.provider_service.generate_completion(
            messages=messages,
            tools=self.tools,
            user=user_id
        )
        
        # Process tool calls if any
        if response.get("tool_calls"):
            tool_responses = await self._process_tool_calls(response["tool_calls"], user_id)
            
            # Add tool responses to conversation history
            for tool_response in tool_responses:
                self.state.conversation_history.append(
                    Message(
                        role=MessageRole.TOOL,
                        content=tool_response["content"],
                        tool_call_id=tool_response["tool_call_id"]
                    )
                )
            
            # Generate new response with tool results
            updated_messages = [msg.model_dump() for msg in self.state.conversation_history]
            final_response = await self.provider_service.generate_completion(
                messages=updated_messages,
                tools=self.tools,
                user=user_id
            )
            
            # Handle any additional tool calls (recursive)
            if final_response.get("tool_calls"):
                # For simplicity, we'll limit to one level of recursion
                return await self._handle_recursive_tool_calls(final_response, user_id)
            
            return final_response["message"]["content"]
        
        return response["message"]["content"]
    
    async def _handle_recursive_tool_calls(self, response: Dict[str, Any], user_id: str) -> str:
        """Handle additional tool calls recursively."""
        tool_responses = await self._process_tool_calls(response["tool_calls"], user_id)
        
        # Add tool responses to conversation history
        for tool_response in tool_responses:
            self.state.conversation_history.append(
                Message(
                    role=MessageRole.TOOL,
                    content=tool_response["content"],
                    tool_call_id=tool_response["tool_call_id"]
                )
            )
        
        # Generate final response with all tool results
        updated_messages = [msg.model_dump() for msg in self.state.conversation_history]
        final_response = await self.provider_service.generate_completion(
            messages=updated_messages,
            tools=self.tools,
            user=user_id
        )
        
        return final_response["message"]["content"]
    
    async def _process_tool_calls(self, tool_calls: List[Dict[str, Any]], user_id: str) -> List[Dict[str, Any]]:
        """Process tool calls and return tool responses."""
        tool_responses = []
        
        for tool_call in tool_calls:
            tool_name = tool_call["function"]["name"]
            tool_args_json = tool_call["function"]["arguments"]
            tool_call_id = tool_call["id"]
            
            try:
                # Parse arguments as JSON
                tool_args = json.loads(tool_args_json)
                
                # Process based on tool name
                if tool_name == "list_tasks":
                    result = await self.task_service.list_tasks(
                        user_id=user_id,
                        status=tool_args.get("status", "all"),
                        limit=tool_args.get("limit", 10)
                    )
                    
                    if result:
                        tasks_formatted = "\n\n".join([
                            f"ID: {task['id']}\n"
                            f"Title: {task['title']}\n"
                            f"Status: {task['status']}\n"
                            f"Priority: {task['priority']}\n"
                            f"Due Date: {task['due_date']}\n"
                            f"Description: {task['description']}"
                            for task in result
                        ])
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": f"Found {len(result)} tasks:\n\n{tasks_formatted}"
                        })
                    else:
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": "No tasks found matching your criteria."
                        })
                
                elif tool_name == "create_task":
                    result = await self.task_service.create_task(
                        user_id=user_id,
                        title=tool_args["title"],
                        description=tool_args.get("description", ""),
                        due_date=tool_args.get("due_date"),
                        priority=tool_args.get("priority", "medium")
                    )
                    
                    tool_responses.append({
                        "tool_call_id": tool_call_id,
                        "content": f"Task created successfully.\n\nID: {result['id']}\nTitle: {result['title']}"
                    })
                
                elif tool_name == "update_task":
                    update_data = {k: v for k, v in tool_args.items() if k != "task_id"}
                    result = await self.task_service.update_task(
                        user_id=user_id,
                        task_id=tool_args["task_id"],
                        **update_data
                    )
                    
                    if result:
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": f"Task updated successfully.\n\nID: {result['id']}\nTitle: {result['title']}\nStatus: {result['status']}"
                        })
                    else:
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": f"Task with ID {tool_args['task_id']} not found or you don't have permission to update it."
                        })
                
                elif tool_name == "delete_task":
                    if not tool_args.get("confirm", False):
                        tool_responses.append({
                            "tool_call_id": tool_call_id,
                            "content": "Task deletion requires confirmation. Please set 'confirm' to true to proceed."
                        })
                    else:
                        result = await self.task_service.delete_task(
                            user_id=user_id,
                            task_id=tool_args["task_id"]
                        )
                        
                        if result:
                            tool_responses.append({
                                "tool_call_id": tool_call_id,
                                "content": f"Task with ID {tool_args['task_id']} has been deleted successfully."
                            })
                        else:
                            tool_responses.append({
                                "tool_call_id": tool_call_id,
                                "content": f"Task with ID {tool_args['task_id']} not found or you don't have permission to delete it."
                            })
            
            except json.JSONDecodeError:
                tool_responses.append({
                    "tool_call_id": tool_call_id,
                    "content": "Error: Invalid JSON in tool arguments."
                })
            except KeyError as e:
                tool_responses.append({
                    "tool_call_id": tool_call_id,
                    "content": f"Error: Missing required parameter: {str(e)}"
                })
            except Exception as e:
                logger.error(f"Error processing tool call {tool_name}: {str(e)}")
                tool_responses.append({
                    "tool_call_id": tool_call_id,
                    "content": f"Error executing {tool_name}: {str(e)}"
                })
        
        return tool_responses

Agent Factory and Orchestration

Python
# app/agents/agent_factory.py
from typing import Dict, Any, Optional, List, Type
import logging

from app.agents.base_agent import BaseAgent
from app.agents.research_agent import ResearchAgent
from app.agents.conversation_manager import ConversationManager
from app.agents.contextual_agent import ContextualAgent
from app.agents.task_agent import TaskManagementAgent

from app.services.provider_service import ProviderService
from app.services.knowledge_service import KnowledgeService
from app.services.memory_service import MemoryService
from app.services.task_service import TaskService

logger = logging.getLogger(__name__)

class AgentFactory:
    """Factory for creating agent instances based on requirements."""
    
    def __init__(self, 
                 provider_service: ProviderService,
                 knowledge_service: Optional[KnowledgeService] = None,
                 memory_service: Optional[MemoryService] = None,
                 task_service: Optional[TaskService] = None):
        self.provider_service = provider_service
        self.knowledge_service = knowledge_service
        self.memory_service = memory_service
        self.task_service = task_service
        
        # Register available agent types
        self.agent_types: Dict[str, Type[BaseAgent]] = {
            "research": ResearchAgent,
            "conversation": ConversationManager,
            "contextual": ContextualAgent,
            "task": TaskManagementAgent
        }
    
    def create_agent(self, 
                    agent_type: str, 
                    system_prompt: str, 
                    tools: Optional[List[Dict[str, Any]]] = None,
                    **kwargs) -> BaseAgent:
        """Create and return an agent instance of the specified type."""
        if agent_type not in self.agent_types:
            raise ValueError(f"Unknown agent type: {agent_type}. Available types: {list(self.agent_types.keys())}")
        
        agent_class = self.agent_types[agent_type]
        
        # Prepare required services based on agent type
        agent_kwargs = {
            "provider_service": self.provider_service,
            "system_prompt": system_prompt,
            "tools": tools
        }
        
        # Add specialized services based on agent type
        if agent_type == "research" and self.knowledge_service:
            agent_kwargs["knowledge_service"] = self.knowledge_service
        
        if agent_type == "contextual" and self.memory_service:
            agent_kwargs["memory_service"] = self.memory_service
            
        if agent_type == "task" and self.task_service:
            agent_kwargs["task_service"] = self.task_service
        
        # Add any additional kwargs
        agent_kwargs.update(kwargs)
        
        # Create and return the agent instance
        return agent_class(**agent_kwargs)

Metaframework for Agent Composition

Python
# app/agents/meta_agent.py
from typing import Dict, List, Any, Optional
import logging
import asyncio
import json

from app.agents.base_agent import BaseAgent, AgentState
from app.models.message import Message, MessageRole
from app.services.provider_service import ProviderService

logger = logging.getLogger(__name__)

class AgentSubsystem:
    """Represents a specialized agent within the MetaAgent."""
    
    def __init__(self, name: str, agent: BaseAgent, role: str):
        self.name = name
        self.agent = agent
        self.role = role
        self.active = True

class MetaAgent(BaseAgent):
    """A meta-agent that coordinates multiple specialized agents."""
    
    def __init__(self, 
                 provider_service: ProviderService,
                 system_prompt: str,
                 subsystems: Optional[List[AgentSubsystem]] = None,
                 state: Optional[AgentState] = None):
        super().__init__(provider_service, system_prompt, [], state)
        self.subsystems = subsystems or []
        
        # Tools specific to the meta-agent
        self.tools.extend([
            {
                "type": "function",
                "function": {
                    "name": "route_to_subsystem",
                    "description": "Route a task to a specific subsystem agent",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "subsystem": {
                                "type": "string",
                                "description": "The name of the subsystem to route to"
                            },
                            "task": {
                                "type": "string",
                                "description": "The task to be performed by the subsystem"
                            },
                            "context": {
                                "type": "object",
                                "description": "Additional context for the subsystem"
                            }
                        },
                        "required": ["subsystem", "task"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "parallel_processing",
                    "description": "Process a task in parallel across multiple subsystems",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "task": {
                                "type": "string",
                                "description": "The task to process in parallel"
                            },
                            "subsystems": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                },
                                "description": "List of subsystems to involve"
                            }
                        },
                        "required": ["task", "subsystems"]
                    }
                }
            }
        ])
    
    def add_subsystem(self, subsystem: AgentSubsystem):
        """Add a new subsystem to the meta-agent."""
        # Check for duplicate names
        if any(sys.name == subsystem.name for sys in self.subsystems):
            raise ValueError(f"Subsystem with name '{subsystem.name}' already exists")
        
        self.subsystems.append(subsystem)
    
    def get_subsystem(self, name: str) -> Optional[AgentSubsystem]:
        """Get a subsystem by name."""
        for subsystem in self.subsystems:
            if subsystem.name == name:
                return subsystem
        return None
    
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response using the meta-agent architecture."""
        # Extract the last user message
        last_user_message = next(
            (msg for msg in reversed(self.state.conversation_history) 
             if msg.role == MessageRole.USER),
            None
        )
        
        if not last_user_message:
            return "I don't have any messages to respond to."
        
        # First, determine routing strategy using the coordinator
        coordinator_messages = [
            {"role": "system", "content": f"""
            You are the coordinator of a multi-agent system with the following subsystems:
            
            {self._format_subsystems()}
            
            Your job is to analyze the user's message and determine the optimal processing strategy:
            1. If the query is best handled by a single specialized subsystem, use route_to_subsystem
            2. If the query would benefit from multiple perspectives, use parallel_processing
            
            Choose the most appropriate strategy based on the complexity and nature of the request.
            """},
            {"role": "user", "content": last_user_message.content}
        ]
        
        routing_response = await self.provider_service.generate_completion(
            messages=coordinator_messages,
            tools=self.tools,
            tool_choice="auto",
            user=user_id
        )
        
        # Process based on the routing decision
        if routing_response.get("tool_calls"):
            tool_call = routing_response["tool_calls"][0]
            function_name = tool_call["function"]["name"]
            
            try:
                function_args = json.loads(tool_call["function"]["arguments"])
                
                if function_name == "route_to_subsystem":
                    return await self._handle_single_subsystem_route(
                        function_args["subsystem"],
                        function_args["task"],
                        function_args.get("context", {}),
                        user_id
                    )
                
                elif function_name == "parallel_processing":
                    return await self._handle_parallel_processing(
                        function_args["task"],
                        function_args["subsystems"],
                        user_id
                    )
            
            except json.JSONDecodeError:
                logger.error("Error parsing function arguments")
            except KeyError as e:
                logger.error(f"Missing required parameter: {e}")
            except Exception as e:
                logger.error(f"Error in routing: {e}")
        
        # Fallback to direct response
        return await self._handle_direct_response(user_id)
    
    async def _handle_single_subsystem_route(self, 
                                           subsystem_name: str, 
                                           task: str,
                                           context: Dict[str, Any],
                                           user_id: str) -> str:
        """Handle routing to a single subsystem."""
        subsystem = self.get_subsystem(subsystem_name)
        
        if not subsystem or not subsystem.active:
            return f"Error: Subsystem '{subsystem_name}' not found or not active. Please try a different approach."
        
        # Process with the selected subsystem
        response = await subsystem.agent.process_message(task, user_id)
        
        # Format the response to indicate the source
        return f"[{subsystem.name} - {subsystem.role}] {response}"
    
    async def _handle_parallel_processing(self,
                                        task: str,
                                        subsystem_names: List[str],
                                        user_id: str) -> str:
        """Handle parallel processing across multiple subsystems."""
        # Validate subsystems
        valid_subsystems = []
        for name in subsystem_names:
            subsystem = self.get_subsystem(name)
            if subsystem and subsystem.active:
                valid_subsystems.append(subsystem)
        
        if not valid_subsystems:
            return "Error: None of the specified subsystems are available."
        
        # Process in parallel
        tasks = [subsystem.agent.process_message(task, user_id) for subsystem in valid_subsystems]
        responses = await asyncio.gather(*tasks)
        
        # Format responses
        formatted_responses = [
            f"## {subsystem.name} ({subsystem.role}):\n{response}"
            for subsystem, response in zip(valid_subsystems, responses)
        ]
        
        # Synthesize a final response
        synthesis_prompt = f"""
        The user's request was processed by multiple specialized agents:
        
        {"".join(formatted_responses)}
        
        Synthesize a comprehensive response that incorporates these perspectives.
        Highlight areas of agreement and provide a balanced view where there are differences.
        """
        
        synthesis_messages = [
            {"role": "system", "content": "You are a synthesis agent that combines multiple specialized perspectives into a coherent response."},
            {"role": "user", "content": synthesis_prompt}
        ]
        
        synthesis = await self.provider_service.generate_completion(
            messages=synthesis_messages,
            user=user_id
        )
        
        return synthesis["message"]["content"]
    
    async def _handle_direct_response(self, user_id: str) -> str:
        """Handle direct response when no routing is determined."""
        # Generate a response directly using the provider service
        response = await self.provider_service.generate_completion(
            messages=[msg.model_dump() for msg in self.state.conversation_history],
            user=user_id
        )
        
        return response["message"]["content"]
    
    def _format_subsystems(self) -> str:
        """Format subsystem information for the coordinator prompt."""
        return "\n".join([
            f"- {subsystem.name}: {subsystem.role}" 
            for subsystem in self.subsystems if subsystem.active
        ])

Sample Agent Usage Implementation

Python
# app/main.py
import asyncio
import logging
from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel
from typing import List, Optional, Dict, Any

from app.agents.agent_factory import AgentFactory
from app.agents.meta_agent import MetaAgent, AgentSubsystem
from app.services.provider_service import ProviderService
from app.services.knowledge_service import KnowledgeService
from app.services.memory_service import MemoryService
from app.services.task_service import TaskService

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(title="MCP Agent System")

# Initialize services
provider_service = ProviderService()
knowledge_service = KnowledgeService()
memory_service = MemoryService()
task_service = TaskService()

# Initialize agent factory
agent_factory = AgentFactory(
    provider_service=provider_service,
    knowledge_service=knowledge_service,
    memory_service=memory_service,
    task_service=task_service
)

# Agent session storage
agent_sessions = {}

# Define request/response models
class MessageRequest(BaseModel):
    message: str
    session_id: Optional[str] = None
    agent_type: Optional[str] = None

class MessageResponse(BaseModel):
    response: str
    session_id: str

# Auth dependency
async def verify_api_key(authorization: Optional[str] = Header(None)):
    if not authorization or not authorization.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Invalid or missing API key")
    
    # Simple validation for demo purposes
    token = authorization.replace("Bearer ", "")
    if token != "demo_api_key":  # In production, validate against secure storage
        raise HTTPException(status_code=401, detail="Invalid API key")
    
    return token

# Routes
@app.post("/api/v1/chat", response_model=MessageResponse)
async def chat(
    request: MessageRequest,
    api_key: str = Depends(verify_api_key)
):
    user_id = "demo_user"  # In production, extract from API key or auth token
    
    # Create or retrieve session
    session_id = request.session_id
    if not session_id or session_id not in agent_sessions:
        # Create a new agent instance if session doesn't exist
        session_id = f"session_{len(agent_sessions) + 1}"
        
        # Determine agent type
        agent_type = request.agent_type or "meta"
        
        if agent_type == "meta":
            # Create a meta-agent with multiple specialized subsystems
            research_agent = agent_factory.create_agent(
                agent_type="research",
                system_prompt="You are a research specialist that provides in-depth, accurate information based on available knowledge."
            )
            
            conversation_agent = agent_factory.create_agent(
                agent_type="conversation",
                system_prompt="You are a conversation expert that helps maintain engaging, relevant, and structured discussions."
            )
            
            task_agent = agent_factory.create_agent(
                agent_type="task",
                system_prompt="You are a task management specialist that helps organize, track, and complete tasks efficiently."
            )
            
            meta_agent = MetaAgent(
                provider_service=provider_service,
                system_prompt="You are an advanced assistant that coordinates multiple specialized systems to provide optimal responses."
            )
            
            # Add subsystems to meta-agent
            meta_agent.add_subsystem(AgentSubsystem(
                name="research",
                agent=research_agent,
                role="Knowledge and information retrieval specialist"
            ))
            
            meta_agent.add_subsystem(AgentSubsystem(
                name="conversation",
                agent=conversation_agent,
                role="Conversation flow and engagement specialist"
            ))
            
            meta_agent.add_subsystem(AgentSubsystem(
                name="task",
                agent=task_agent,
                role="Task management and organization specialist"
            ))
            
            agent = meta_agent
        else:
            # Create a specialized agent
            agent = agent_factory.create_agent(
                agent_type=agent_type,
                system_prompt=f"You are a helpful assistant specializing in {agent_type} tasks."
            )
        
        agent_sessions[session_id] = agent
    else:
        agent = agent_sessions[session_id]
    
    # Process the message
    try:
        response = await agent.process_message(request.message, user_id)
        return MessageResponse(response=response, session_id=session_id)
    except Exception as e:
        logger.exception("Error processing message")
        raise HTTPException(status_code=500, detail=f"Error processing message: {str(e)}")

# Startup event
@app.on_event("startup")
async def startup_event():
    # Initialize services
    await provider_service.initialize()
    await knowledge_service.initialize()
    await memory_service.initialize()
    await task_service.initialize()
    
    logger.info("All services initialized")

# Shutdown event
@app.on_event("shutdown")
async def shutdown_event():
    # Cleanup
    await provider_service.cleanup()
    await knowledge_service.cleanup()
    await memory_service.cleanup()
    await task_service.cleanup()
    
    logger.info("All services shut down")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Conclusion

This comprehensive implementation demonstrates the integration of OpenAI's Responses API within a sophisticated agent architecture. The modular design allows for specialized cognitive capabilities including knowledge retrieval, conversation management, contextual awareness, and task coordination.

Key architectural features include:

Abstraction Layers: The system maintains clean separation between provider services, agent logic, and specialized capabilities.
Contextual Enhancement: Agents utilize memory systems and knowledge retrieval to maintain context and provide more relevant responses.
Tool Integration: The implementation leverages OpenAI's function calling capabilities to integrate with external systems and services.
Meta-Agent Architecture: The meta-agent pattern enables composition of specialized agents into a coherent system that routes queries optimally.
Stateful Conversations: All agents maintain conversation state, allowing for continuity and context preservation across interactions.

This architecture provides a foundation for building sophisticated AI applications that leverage both OpenAI's cloud capabilities and local Ollama models through the MCP system's intelligent routing.

Hybrid Intelligence Architecture: Integrating Ollama with OpenAI's Agent SDK

Theoretical Framework for Hybrid Model Inference

The integration of Ollama with OpenAI's Agent SDK represents a significant advancement in hybrid AI architectures. This document articulates the methodological approach for implementing a sophisticated orchestration layer that intelligently routes inference tasks between cloud-based and local computational resources based on contextual parameters.

Ollama Integration Architecture

Core Integration Components

Python
# app/services/ollama_service.py
import os
import json
import logging
from typing import List, Dict, Any, Optional, Union
import aiohttp
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

from app.models.message import Message, MessageRole
from app.config import settings

logger = logging.getLogger(__name__)

class OllamaService:
    """Service for interacting with Ollama's local inference capabilities."""
    
    def __init__(self):
        self.base_url = settings.OLLAMA_HOST
        self.default_model = settings.OLLAMA_MODEL
        self.timeout = aiohttp.ClientTimeout(total=settings.REQUEST_TIMEOUT)
        self.session = None
        
        # Capability mapping for different models
        self.model_capabilities = {
            "llama2": {
                "supports_tools": False,
                "context_window": 4096,
                "strengths": ["general_knowledge", "reasoning"],
                "max_tokens": 2048
            },
            "codellama": {
                "supports_tools": False,
                "context_window": 8192,
                "strengths": ["code_generation", "technical_explanation"],
                "max_tokens": 2048
            },
            "mistral": {
                "supports_tools": False,
                "context_window": 8192,
                "strengths": ["instruction_following", "reasoning"],
                "max_tokens": 2048
            },
            "dolphin-mistral": {
                "supports_tools": False,
                "context_window": 8192,
                "strengths": ["conversational", "creative_writing"],
                "max_tokens": 2048
            }
        }
    
    async def initialize(self):
        """Initialize the Ollama service."""
        self.session = aiohttp.ClientSession(timeout=self.timeout)
        
        # Verify connectivity
        try:
            await self.list_models()
            logger.info("Ollama service initialized successfully")
        except Exception as e:
            logger.error(f"Failed to initialize Ollama service: {str(e)}")
            raise
    
    async def cleanup(self):
        """Clean up resources."""
        if self.session:
            await self.session.close()
            self.session = None
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
    async def list_models(self) -> List[Dict[str, Any]]:
        """List available models in Ollama."""
        if not self.session:
            self.session = aiohttp.ClientSession(timeout=self.timeout)
            
        async with self.session.get(f"{self.base_url}/api/tags") as response:
            if response.status != 200:
                error_text = await response.text()
                raise Exception(f"Failed to list models: {error_text}")
            
            data = await response.json()
            return data.get("models", [])
    
    async def generate_completion(
        self,
        messages: List[Dict[str, str]],
        model: Optional[str] = None,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        tools: Optional[List[Dict[str, Any]]] = None,
        stream: bool = False,
        **kwargs
    ) -> Dict[str, Any]:
        """Generate a completion using Ollama."""
        model_name = model or self.default_model
        
        # Check if specified model is available
        try:
            available_models = await self.list_models()
            model_names = [m.get("name") for m in available_models]
            
            if model_name not in model_names:
                fallback_model = self.default_model
                logger.warning(
                    f"Model '{model_name}' not available in Ollama. "
                    f"Using fallback model '{fallback_model}'."
                )
                model_name = fallback_model
        except Exception as e:
            logger.error(f"Error checking model availability: {str(e)}")
            model_name = self.default_model
        
        # Get model capabilities
        model_base_name = model_name.split(':')[0] if ':' in model_name else model_name
        capabilities = self.model_capabilities.get(
            model_base_name, 
            {"supports_tools": False, "context_window": 4096, "max_tokens": 2048}
        )
        
        # Check if tools are requested but not supported
        if tools and not capabilities["supports_tools"]:
            logger.warning(
                f"Model '{model_name}' does not support tools. "
                "Tool functionality will be simulated with prompt engineering."
            )
            # We'll handle this by incorporating tool descriptions into the prompt
        
        # Format messages for Ollama
        prompt = self._format_messages_for_ollama(messages, tools)
        
        # Set max_tokens based on capabilities if not provided
        if max_tokens is None:
            max_tokens = capabilities["max_tokens"]
        else:
            max_tokens = min(max_tokens, capabilities["max_tokens"])
        
        # Prepare request payload
        payload = {
            "model": model_name,
            "prompt": prompt,
            "stream": stream,
            "options": {
                "temperature": temperature,
                "num_predict": max_tokens
            }
        }
        
        if stream:
            return await self._stream_completion(payload)
        else:
            return await self._generate_completion_sync(payload)
    
    async def _generate_completion_sync(self, payload: Dict[str, Any]) -> Dict[str, Any]:
        """Generate a completion synchronously."""
        if not self.session:
            self.session = aiohttp.ClientSession(timeout=self.timeout)
            
        try:
            async with self.session.post(
                f"{self.base_url}/api/generate", 
                json=payload
            ) as response:
                if response.status != 200:
                    error_text = await response.text()
                    raise Exception(f"Ollama generate error: {error_text}")
                
                result = await response.json()
                
                # Format the response to match OpenAI's format for consistency
                formatted_response = self._format_ollama_response(result, payload)
                return formatted_response
                
        except Exception as e:
            logger.error(f"Error generating completion: {str(e)}")
            raise
    
    async def _stream_completion(self, payload: Dict[str, Any]):
        """Stream a completion."""
        if not self.session:
            self.session = aiohttp.ClientSession(timeout=self.timeout)
            
        try:
            async with self.session.post(
                f"{self.base_url}/api/generate", 
                json=payload, 
                timeout=aiohttp.ClientTimeout(total=60)
            ) as response:
                if response.status != 200:
                    error_text = await response.text()
                    raise Exception(f"Ollama generate error: {error_text}")
                
                # Stream the response
                full_text = ""
                async for line in response.content:
                    if not line:
                        continue
                    
                    try:
                        chunk = json.loads(line)
                        text_chunk = chunk.get("response", "")
                        full_text += text_chunk
                        
                        # Yield formatted chunk for streaming
                        yield self._format_ollama_stream_chunk(text_chunk)
                        
                        # Check if done
                        if chunk.get("done", False):
                            break
                    except json.JSONDecodeError:
                        logger.warning(f"Invalid JSON in stream: {line}")
                
                # Send the final done chunk
                yield self._format_ollama_stream_chunk("", done=True, full_text=full_text)
                
        except Exception as e:
            logger.error(f"Error streaming completion: {str(e)}")
            raise
    
    def _format_messages_for_ollama(
        self, 
        messages: List[Dict[str, str]],
        tools: Optional[List[Dict[str, Any]]] = None
    ) -> str:
        """Format messages for Ollama."""
        formatted_messages = []
        
        # Add tools descriptions if provided
        if tools:
            tools_description = self._format_tools_description(tools)
            formatted_messages.append(f"[System]\n{tools_description}\n")
        
        for msg in messages:
            role = msg["role"]
            content = msg["content"] or ""
            
            if role == "system":
                formatted_messages.append(f"[System]\n{content}")
            elif role == "user":
                formatted_messages.append(f"[User]\n{content}")
            elif role == "assistant":
                formatted_messages.append(f"[Assistant]\n{content}")
            elif role == "tool":
                # Format tool responses
                tool_call_id = msg.get("tool_call_id", "unknown")
                formatted_messages.append(f"[Tool Result: {tool_call_id}]\n{content}")
        
        # Add final prompt for assistant response
        formatted_messages.append("[Assistant]\n")
        
        return "\n\n".join(formatted_messages)
    
    def _format_tools_description(self, tools: List[Dict[str, Any]]) -> str:
        """Format tools description for inclusion in the prompt."""
        tools_text = ["You have access to the following tools:"]
        
        for tool in tools:
            if tool.get("type") == "function":
                function = tool["function"]
                function_name = function["name"]
                function_description = function.get("description", "")
                
                tools_text.append(f"Tool: {function_name}")
                tools_text.append(f"Description: {function_description}")
                
                # Format parameters if available
                if "parameters" in function:
                    parameters = function["parameters"]
                    if "properties" in parameters:
                        tools_text.append("Parameters:")
                        for param_name, param_details in parameters["properties"].items():
                            param_type = param_details.get("type", "unknown")
                            param_desc = param_details.get("description", "")
                            required = "Required" if param_name in parameters.get("required", []) else "Optional"
                            tools_text.append(f"  - {param_name} ({param_type}, {required}): {param_desc}")
                
                tools_text.append("")  # Empty line between tools
        
        tools_text.append("""
When you need to use a tool, specify it clearly using the format:


{
  "name": "tool_name",
  "parameters": {
    "param1": "value1",
    "param2": "value2"
  }
}


Wait for the tool result before continuing.
""")
        
        return "\n".join(tools_text)
    
    def _format_ollama_response(self, result: Dict[str, Any], request: Dict[str, Any]) -> Dict[str, Any]:
        """Format Ollama response to match OpenAI's format."""
        response_text = result.get("response", "")
        
        # Check for tool calls in the response
        tool_calls = self._extract_tool_calls(response_text)
        
        # Calculate token counts (approximate)
        prompt_tokens = len(request["prompt"]) // 4  # Rough approximation
        completion_tokens = len(response_text) // 4  # Rough approximation
        
        response = {
            "id": f"ollama-{result.get('id', 'unknown')}",
            "object": "chat.completion",
            "created": int(result.get("created_at", 0)),
            "model": request["model"],
            "provider": "ollama",
            "usage": {
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": prompt_tokens + completion_tokens
            },
            "message": {
                "role": "assistant",
                "content": self._clean_tool_calls_from_text(response_text) if tool_calls else response_text,
                "tool_calls": tool_calls
            }
        }
        
        return response
    
    def _format_ollama_stream_chunk(
        self, 
        chunk_text: str, 
        done: bool = False,
        full_text: Optional[str] = None
    ) -> Dict[str, Any]:
        """Format a streaming chunk to match OpenAI's format."""
        if done and full_text:
            # Final chunk might include tool calls
            tool_calls = self._extract_tool_calls(full_text)
            cleaned_text = self._clean_tool_calls_from_text(full_text) if tool_calls else full_text
            
            return {
                "id": f"ollama-chunk-{id(chunk_text)}",
                "object": "chat.completion.chunk",
                "created": int(time.time()),
                "model": self.default_model,
                "choices": [{
                    "index": 0,
                    "delta": {
                        "content": "",
                        "tool_calls": tool_calls if tool_calls else None
                    },
                    "finish_reason": "stop"
                }]
            }
        else:
            return {
                "id": f"ollama-chunk-{id(chunk_text)}",
                "object": "chat.completion.chunk",
                "created": int(time.time()),
                "model": self.default_model,
                "choices": [{
                    "index": 0,
                    "delta": {
                        "content": chunk_text
                    },
                    "finish_reason": None
                }]
            }
    
    def _extract_tool_calls(self, text: str) -> Optional[List[Dict[str, Any]]]:
        """Extract tool calls from response text."""
        import re
        import uuid
        
        # Look for tool calls in the format ...
        tool_pattern = re.compile(r'(.*?)', re.DOTALL)
        matches = tool_pattern.findall(text)
        
        if not matches:
            return None
        
        tool_calls = []
        for i, match in enumerate(matches):
            try:
                # Try to parse as JSON
                tool_data = json.loads(match.strip())
                
                tool_calls.append({
                    "id": f"call_{uuid.uuid4().hex[:8]}",
                    "type": "function",
                    "function": {
                        "name": tool_data.get("name", "unknown_tool"),
                        "arguments": json.dumps(tool_data.get("parameters", {}))
                    }
                })
            except json.JSONDecodeError:
                # If not valid JSON, try to extract name and arguments using regex
                name_match = re.search(r'"name"\s*:\s*"([^"]+)"', match)
                args_match = re.search(r'"parameters"\s*:\s*(\{.*\})', match)
                
                if name_match:
                    tool_name = name_match.group(1)
                    tool_args = "{}" if not args_match else args_match.group(1)
                    
                    tool_calls.append({
                        "id": f"call_{uuid.uuid4().hex[:8]}",
                        "type": "function",
                        "function": {
                            "name": tool_name,
                            "arguments": tool_args
                        }
                    })
        
        return tool_calls if tool_calls else None
    
    def _clean_tool_calls_from_text(self, text: str) -> str:
        """Remove tool calls from response text."""
        import re
        
        # Remove ... blocks
        cleaned_text = re.sub(r'.*?', '', text, flags=re.DOTALL)
        
        # Remove any leftover tool usage instructions
        cleaned_text = re.sub(r'I will use a tool to help with this\.', '', cleaned_text)
        cleaned_text = re.sub(r'Let me use the .* tool\.', '', cleaned_text)
        
        # Clean up multiple newlines
        cleaned_text = re.sub(r'\n{3,}', '\n\n', cleaned_text)
        
        return cleaned_text.strip()

Provider Selection Service

Python
# app/services/provider_service.py
import os
import json
import logging
import time
from typing import List, Dict, Any, Optional, Union, AsyncGenerator
import asyncio
from enum import Enum
import hashlib

import openai
from openai import AsyncOpenAI
from app.services.ollama_service import OllamaService
from app.config import settings

logger = logging.getLogger(__name__)

class Provider(str, Enum):
    OPENAI = "openai"
    OLLAMA = "ollama"
    AUTO = "auto"

class ModelSelectionCriteria:
    """Criteria for model selection in auto-routing."""
    def __init__(
        self,
        complexity_threshold: float = 0.65,
        privacy_sensitive_tokens: List[str] = None,
        latency_requirement: Optional[float] = None,
        token_budget: Optional[int] = None,
        tool_requirements: Optional[List[str]] = None
    ):
        self.complexity_threshold = complexity_threshold
        self.privacy_sensitive_tokens = privacy_sensitive_tokens or []
        self.latency_requirement = latency_requirement
        self.token_budget = token_budget
        self.tool_requirements = tool_requirements

class ProviderService:
    """Service for routing requests to the appropriate provider."""
    
    def __init__(self):
        self.openai_client = None
        self.ollama_service = OllamaService()
        self.model_selection_criteria = ModelSelectionCriteria(
            complexity_threshold=settings.COMPLEXITY_THRESHOLD,
            privacy_sensitive_tokens=settings.PRIVACY_SENSITIVE_TOKENS.split(",") if hasattr(settings, "PRIVACY_SENSITIVE_TOKENS") else []
        )
        
        # Model mappings
        self.default_openai_model = settings.OPENAI_MODEL
        self.default_ollama_model = settings.OLLAMA_MODEL
        
        # Response cache
        self.cache_enabled = getattr(settings, "ENABLE_RESPONSE_CACHE", False)
        self.cache = {}
        self.cache_ttl = getattr(settings, "RESPONSE_CACHE_TTL", 3600)  # 1 hour default
    
    async def initialize(self):
        """Initialize the provider service."""
        # Initialize OpenAI client
        self.openai_client = AsyncOpenAI(
            api_key=settings.OPENAI_API_KEY,
            organization=getattr(settings, "OPENAI_ORG_ID", None)
        )
        
        # Initialize Ollama service
        await self.ollama_service.initialize()
        
        logger.info("Provider service initialized")
    
    async def cleanup(self):
        """Clean up resources."""
        await self.ollama_service.cleanup()
    
    async def generate_completion(
        self,
        messages: List[Dict[str, str]],
        model: Optional[str] = None,
        provider: Optional[Union[str, Provider]] = None,
        tools: Optional[List[Dict[str, Any]]] = None,
        stream: bool = False,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        user: Optional[str] = None,
        **kwargs
    ) -> Dict[str, Any]:
        """Generate a completion from the selected provider."""
        # Determine the provider and model
        selected_provider, selected_model = await self._select_provider_and_model(
            messages, model, provider, tools, **kwargs
        )
        
        # Check cache if enabled and not streaming
        if self.cache_enabled and not stream:
            cache_key = self._generate_cache_key(
                messages, selected_provider, selected_model, tools, temperature, max_tokens, kwargs
            )
            cached_response = self._get_from_cache(cache_key)
            if cached_response:
                logger.info(f"Cache hit for {selected_provider}:{selected_model}")
                return cached_response
        
        # Generate completion based on selected provider
        try:
            if selected_provider == Provider.OPENAI:
                response = await self._generate_openai_completion(
                    messages, selected_model, tools, stream, temperature, max_tokens, user, **kwargs
                )
            else:  # OLLAMA
                response = await self._generate_ollama_completion(
                    messages, selected_model, tools, stream, temperature, max_tokens, **kwargs
                )
            
            # Add provider info and cache if appropriate
            if not stream and response:
                response["provider"] = selected_provider.value
                if self.cache_enabled:
                    self._add_to_cache(cache_key, response)
            
            return response
        except Exception as e:
            logger.error(f"Error generating completion with {selected_provider}: {str(e)}")
            
            # Try fallback if auto-routing was enabled
            if provider == Provider.AUTO:
                fallback_provider = Provider.OLLAMA if selected_provider == Provider.OPENAI else Provider.OPENAI
                logger.info(f"Attempting fallback to {fallback_provider}")
                
                try:
                    if fallback_provider == Provider.OPENAI:
                        fallback_model = self.default_openai_model
                        response = await self._generate_openai_completion(
                            messages, fallback_model, tools, stream, temperature, max_tokens, user, **kwargs
                        )
                    else:  # OLLAMA
                        fallback_model = self.default_ollama_model
                        response = await self._generate_ollama_completion(
                            messages, fallback_model, tools, stream, temperature, max_tokens, **kwargs
                        )
                    
                    if not stream and response:
                        response["provider"] = fallback_provider.value
                        # Don't cache fallback responses
                    
                    return response
                except Exception as fallback_error:
                    logger.error(f"Fallback also failed: {str(fallback_error)}")
            
            # Re-raise the original error if we couldn't fall back
            raise
    
    async def stream_completion(
        self,
        messages: List[Dict[str, str]],
        model: Optional[str] = None,
        provider: Optional[Union[str, Provider]] = None,
        tools: Optional[List[Dict[str, Any]]] = None,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        user: Optional[str] = None,
        **kwargs
    ) -> AsyncGenerator[Dict[str, Any], None]:
        """Stream a completion from the selected provider."""
        # Always stream with this method
        kwargs["stream"] = True
        
        # Determine the provider and model
        selected_provider, selected_model = await self._select_provider_and_model(
            messages, model, provider, tools, **kwargs
        )
        
        try:
            if selected_provider == Provider.OPENAI:
                async for chunk in self._stream_openai_completion(
                    messages, selected_model, tools, temperature, max_tokens, user, **kwargs
                ):
                    chunk["provider"] = selected_provider.value
                    yield chunk
            else:  # OLLAMA
                async for chunk in self._stream_ollama_completion(
                    messages, selected_model, tools, temperature, max_tokens, **kwargs
                ):
                    chunk["provider"] = selected_provider.value
                    yield chunk
        except Exception as e:
            logger.error(f"Error streaming completion with {selected_provider}: {str(e)}")
            
            # Try fallback if auto-routing was enabled
            if provider == Provider.AUTO:
                fallback_provider = Provider.OLLAMA if selected_provider == Provider.OPENAI else Provider.OPENAI
                logger.info(f"Attempting fallback to {fallback_provider}")
                
                try:
                    if fallback_provider == Provider.OPENAI:
                        fallback_model = self.default_openai_model
                        async for chunk in self._stream_openai_completion(
                            messages, fallback_model, tools, temperature, max_tokens, user, **kwargs
                        ):
                            chunk["provider"] = fallback_provider.value
                            yield chunk
                    else:  # OLLAMA
                        fallback_model = self.default_ollama_model
                        async for chunk in self._stream_ollama_completion(
                            messages, fallback_model, tools, temperature, max_tokens, **kwargs
                        ):
                            chunk["provider"] = fallback_provider.value
                            yield chunk
                except Exception as fallback_error:
                    logger.error(f"Fallback streaming also failed: {str(fallback_error)}")
                    # Nothing more we can do here
            
            # For streaming, we don't re-raise since we've already started the response
    
    async def _select_provider_and_model(
        self,
        messages: List[Dict[str, str]],
        model: Optional[str] = None,
        provider: Optional[Union[str, Provider]] = None,
        tools: Optional[List[Dict[str, Any]]] = None,
        **kwargs
    ) -> tuple[Provider, str]:
        """Select the provider and model based on input and criteria."""
        # Handle explicit provider/model specification
        if model and ":" in model:
            # Format: "provider:model", e.g. "openai:gpt-4" or "ollama:llama2"
            provider_str, model_name = model.split(":", 1)
            selected_provider = Provider(provider_str.lower())
            return selected_provider, model_name
        
        # Handle explicit provider with default model
        if provider and provider != Provider.AUTO:
            selected_provider = Provider(provider) if isinstance(provider, str) else provider
            selected_model = model or (
                self.default_openai_model if selected_provider == Provider.OPENAI 
                else self.default_ollama_model
            )
            return selected_provider, selected_model
        
        # If model specified without provider, infer provider
        if model:
            # Heuristic: OpenAI models typically start with "gpt-" or "text-"
            if model.startswith(("gpt-", "text-")):
                return Provider.OPENAI, model
            else:
                return Provider.OLLAMA, model
        
        # Auto-routing based on message content and requirements
        if not provider or provider == Provider.AUTO:
            selected_provider = await self._auto_route(messages, tools, **kwargs)
            selected_model = (
                self.default_openai_model if selected_provider == Provider.OPENAI 
                else self.default_ollama_model
            )
            return selected_provider, selected_model
        
        # Default fallback
        return Provider.OPENAI, self.default_openai_model
    
    async def _auto_route(
        self,
        messages: List[Dict[str, str]],
        tools: Optional[List[Dict[str, Any]]] = None,
        **kwargs
    ) -> Provider:
        """Automatically route to the appropriate provider based on content and requirements."""
        # 1. Check for tool requirements
        if tools:
            # If tools are required, prefer OpenAI as Ollama's tool support is limited
            return Provider.OPENAI
        
        # 2. Check for privacy concerns
        if self._contains_sensitive_information(messages):
            logger.info("Privacy sensitive information detected, routing to Ollama")
            return Provider.OLLAMA
        
        # 3. Assess complexity
        complexity_score = await self._assess_complexity(messages)
        logger.info(f"Content complexity score: {complexity_score}")
        
        if complexity_score > self.model_selection_criteria.complexity_threshold:
            logger.info(f"High complexity content ({complexity_score}), routing to OpenAI")
            return Provider.OPENAI
        
        # 4. Consider token budget (if specified)
        token_budget = kwargs.get("token_budget") or self.model_selection_criteria.token_budget
        if token_budget:
            estimated_tokens = self._estimate_token_count(messages)
            if estimated_tokens > token_budget:
                logger.info(f"Token budget ({token_budget}) exceeded ({estimated_tokens}), routing to OpenAI")
                return Provider.OPENAI
        
        # Default to Ollama for standard requests
        logger.info("Standard request, routing to Ollama")
        return Provider.OLLAMA
    
    def _contains_sensitive_information(self, messages: List[Dict[str, str]]) -> bool:
        """Check if messages contain privacy-sensitive information."""
        sensitive_tokens = self.model_selection_criteria.privacy_sensitive_tokens
        if not sensitive_tokens:
            return False
        
        combined_text = " ".join([msg.get("content", "") or "" for msg in messages])
        combined_text = combined_text.lower()
        
        for token in sensitive_tokens:
            if token.lower() in combined_text:
                return True
        
        return False
    
    async def _assess_complexity(self, messages: List[Dict[str, str]]) -> float:
        """Assess the complexity of the messages."""
        # Simple heuristics for complexity:
        # 1. Length of content
        # 2. Presence of complex tokens (technical terms, specialized vocabulary)
        # 3. Sentence complexity
        
        user_messages = [msg.get("content", "") for msg in messages if msg.get("role") == "user"]
        if not user_messages:
            return 0.0
        
        last_message = user_messages[-1] or ""
        
        # 1. Length factor (normalized to 0-1 range)
        length = len(last_message)
        length_factor = min(length / 1000, 1.0) * 0.3  # 30% weight to length
        
        # 2. Complexity indicators
        complex_terms = [
            "analyze", "synthesize", "evaluate", "compare", "contrast",
            "explain", "technical", "detailed", "comprehensive", "algorithm",
            "implementation", "architecture", "design", "optimize", "complex"
        ]
        
        term_count = sum(1 for term in complex_terms if term in last_message.lower())
        term_factor = min(term_count / 10, 1.0) * 0.4  # 40% weight to complex terms
        
        # 3. Sentence complexity (approximated by average sentence length)
        sentences = [s.strip() for s in last_message.split(".") if s.strip()]
        if sentences:
            avg_sentence_length = sum(len(s.split()) for s in sentences) / len(sentences)
            sentence_factor = min(avg_sentence_length / 25, 1.0) * 0.3  # 30% weight to sentence complexity
        else:
            sentence_factor = 0.0
        
        # Combined complexity score
        complexity = length_factor + term_factor + sentence_factor
        
        return complexity
    
    def _estimate_token_count(self, messages: List[Dict[str, str]]) -> int:
        """Estimate the token count for the messages."""
        # Simple approximation: 1 token ≈ 4 characters
        combined_text = " ".join([msg.get("content", "") or "" for msg in messages])
        return len(combined_text) // 4
    
    async def _generate_openai_completion(
        self,
        messages: List[Dict[str, str]],
        model: str,
        tools: Optional[List[Dict[str, Any]]] = None,
        stream: bool = False,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        user: Optional[str] = None,
        **kwargs
    ) -> Dict[str, Any]:
        """Generate a completion using OpenAI."""
        completion_kwargs = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "stream": stream
        }
        
        if max_tokens:
            completion_kwargs["max_tokens"] = max_tokens
        
        if tools:
            completion_kwargs["tools"] = tools
        
        if "tool_choice" in kwargs:
            completion_kwargs["tool_choice"] = kwargs["tool_choice"]
        
        if "response_format" in kwargs:
            completion_kwargs["response_format"] = kwargs["response_format"]
        
        if user:
            completion_kwargs["user"] = user
        
        if stream:
            response_stream = await self.openai_client.chat.completions.create(**completion_kwargs)
            
            full_response = None
            async for chunk in response_stream:
                if not full_response:
                    full_response = chunk
                yield chunk.model_dump()
        else:
            response = await self.openai_client.chat.completions.create(**completion_kwargs)
            return response.model_dump()
    
    async def _stream_openai_completion(
        self,
        messages: List[Dict[str, str]],
        model: str,
        tools: Optional[List[Dict[str, Any]]] = None,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        user: Optional[str] = None,
        **kwargs
    ) -> AsyncGenerator[Dict[str, Any], None]:
        """Stream a completion from OpenAI."""
        # This is just a wrapper around _generate_openai_completion with stream=True
        async for chunk in self._generate_openai_completion(
            messages, model, tools, True, temperature, max_tokens, user, **kwargs
        ):
            yield chunk
    
    async def _generate_ollama_completion(
        self,
        messages: List[Dict[str, str]],
        model: str,
        tools: Optional[List[Dict[str, Any]]] = None,
        stream: bool = False,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        **kwargs
    ) -> Dict[str, Any]:
        """Generate a completion using Ollama."""
        if stream:
            # For streaming, return the first chunk to maintain API consistency
            async for chunk in self.ollama_service.generate_completion(
                messages=messages,
                model=model,
                temperature=temperature,
                max_tokens=max_tokens,
                tools=tools,
                stream=True,
                **kwargs
            ):
                return chunk
        else:
            return await self.ollama_service.generate_completion(
                messages=messages,
                model=model,
                temperature=temperature,
                max_tokens=max_tokens,
                tools=tools,
                stream=False,
                **kwargs
            )
    
    async def _stream_ollama_completion(
        self,
        messages: List[Dict[str, str]],
        model: str,
        tools: Optional[List[Dict[str, Any]]] = None,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        **kwargs
    ) -> AsyncGenerator[Dict[str, Any], None]:
        """Stream a completion from Ollama."""
        async for chunk in self.ollama_service.generate_completion(
            messages=messages,
            model=model,
            temperature=temperature,
            max_tokens=max_tokens,
            tools=tools,
            stream=True,
            **kwargs
        ):
            yield chunk
    
    def _generate_cache_key(self, *args) -> str:
        """Generate a cache key based on the input parameters."""
        # Convert complex objects to JSON strings first
        args_str = json.dumps([arg if not isinstance(arg, (dict, list)) else json.dumps(arg, sort_keys=True) for arg in args])
        return hashlib.md5(args_str.encode()).hexdigest()
    
    def _get_from_cache(self, key: str) -> Optional[Dict[str, Any]]:
        """Get a response from cache if available and not expired."""
        if key not in self.cache:
            return None
            
        cached_item = self.cache[key]
        if time.time() - cached_item["timestamp"] > self.cache_ttl:
            # Expired
            del self.cache[key]
            return None
            
        return cached_item["response"]
    
    def _add_to_cache(self, key: str, response: Dict[str, Any]):
        """Add a response to the cache."""
        self.cache[key] = {
            "response": response,
            "timestamp": time.time()
        }
        
        # Simple cache size management - remove oldest if too many items
        max_cache_size = getattr(settings, "RESPONSE_CACHE_MAX_ITEMS", 1000)
        if len(self.cache) > max_cache_size:
            # Remove oldest 10% of items
            items_to_remove = max(1, int(max_cache_size * 0.1))
            oldest_keys = sorted(
                self.cache.keys(), 
                key=lambda k: self.cache[k]["timestamp"]
            )[:items_to_remove]
            
            for old_key in oldest_keys:
                del self.cache[old_key]

Configuration Settings

Python
# app/config.py
import os
from pydantic_settings import BaseSettings
from typing import List, Optional, Dict, Any
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

class Settings(BaseSettings):
    # API Keys and Authentication
    OPENAI_API_KEY: str
    OPENAI_ORG_ID: Optional[str] = None
    
    # Model Configuration
    OPENAI_MODEL: str = "gpt-4o"
    OLLAMA_MODEL: str = "llama2"
    OLLAMA_HOST: str = "http://localhost:11434"
    
    # System Behavior
    TEMPERATURE: float = 0.7
    MAX_TOKENS: int = 4096
    REQUEST_TIMEOUT: int = 120
    
    # Routing Configuration
    COMPLEXITY_THRESHOLD: float = 0.65
    PRIVACY_SENSITIVE_TOKENS: str = "password,secret,token,key,credential"
    
    # Caching Configuration
    ENABLE_RESPONSE_CACHE: bool = True
    RESPONSE_CACHE_TTL: int = 3600  # 1 hour
    RESPONSE_CACHE_MAX_ITEMS: int = 1000
    
    # Logging Configuration
    LOG_LEVEL: str = "INFO"
    
    # Database Configuration
    DATABASE_URL: Optional[str] = None
    
    # Advanced Ollama Configuration
    OLLAMA_MODELS_MAPPING: Dict[str, str] = {
        "gpt-3.5-turbo": "llama2",
        "gpt-4": "llama2",
        "gpt-4o": "mistral",
        "code-llama": "codellama"
    }
    
    class Config:
        env_file = ".env"
        env_file_encoding = "utf-8"

settings = Settings()

Model Selection and Configuration

Below is a table of recommended Ollama models and their optimal use cases:

Python
# app/models/model_catalog.py
from typing import Dict, List, Any, Optional

class ModelCapability:
    """Represents the capabilities of a model."""
    def __init__(
        self,
        context_window: int,
        strengths: List[str],
        supports_tools: bool,
        recommended_temperature: float,
        approximate_speed: str  # "fast", "medium", "slow"
    ):
        self.context_window = context_window
        self.strengths = strengths
        self.supports_tools = supports_tools
        self.recommended_temperature = recommended_temperature
        self.approximate_speed = approximate_speed

# Ollama model catalog
OLLAMA_MODELS = {
    "llama2": ModelCapability(
        context_window=4096,
        strengths=["general_knowledge", "reasoning", "instruction_following"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "llama2:13b": ModelCapability(
        context_window=4096,
        strengths=["general_knowledge", "reasoning", "instruction_following"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "llama2:70b": ModelCapability(
        context_window=4096,
        strengths=["general_knowledge", "reasoning", "instruction_following"],
        supports_tools=False,
        recommended_temperature=0.65,
        approximate_speed="slow"
    ),
    "mistral": ModelCapability(
        context_window=8192,
        strengths=["instruction_following", "reasoning", "versatility"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "mistral:7b-instruct": ModelCapability(
        context_window=8192,
        strengths=["instruction_following", "chat", "versatility"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "codellama": ModelCapability(
        context_window=16384,
        strengths=["code_generation", "code_explanation", "technical_writing"],
        supports_tools=False,
        recommended_temperature=0.5,
        approximate_speed="medium"
    ),
    "codellama:34b": ModelCapability(
        context_window=16384,
        strengths=["code_generation", "code_explanation", "technical_writing"],
        supports_tools=False,
        recommended_temperature=0.5,
        approximate_speed="slow"
    ),
    "dolphin-mistral": ModelCapability(
        context_window=8192,
        strengths=["conversational", "creative", "helpfulness"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "neural-chat": ModelCapability(
        context_window=8192,
        strengths=["conversational", "instruction_following", "helpfulness"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "orca-mini": ModelCapability(
        context_window=4096,
        strengths=["efficiency", "general_knowledge", "basic_reasoning"],
        supports_tools=False,
        recommended_temperature=0.8,
        approximate_speed="fast"
    ),
    "vicuna": ModelCapability(
        context_window=4096,
        strengths=["conversational", "instruction_following"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="medium"
    ),
    "wizard-math": ModelCapability(
        context_window=4096,
        strengths=["mathematics", "problem_solving", "logical_reasoning"],
        supports_tools=False,
        recommended_temperature=0.5,
        approximate_speed="medium"
    ),
    "phi": ModelCapability(
        context_window=2048,
        strengths=["efficiency", "basic_tasks", "lightweight"],
        supports_tools=False,
        recommended_temperature=0.7,
        approximate_speed="fast"
    )
}

# OpenAI -> Ollama model mapping for fallback scenarios
OPENAI_TO_OLLAMA_MAPPING = {
    "gpt-3.5-turbo": "llama2",
    "gpt-3.5-turbo-16k": "mistral:7b-instruct",
    "gpt-4": "llama2:70b",
    "gpt-4o": "mistral",
    "gpt-4-turbo": "mistral",
    "code-llama": "codellama"
}

# Use case to model recommendations
USE_CASE_RECOMMENDATIONS = {
    "code_generation": ["codellama:34b", "codellama"],
    "creative_writing": ["dolphin-mistral", "mistral:7b-instruct"],
    "mathematical_reasoning": ["wizard-math", "llama2:70b"],
    "conversational": ["neural-chat", "dolphin-mistral"],
    "knowledge_intensive": ["llama2:70b", "mistral"],
    "resource_constrained": ["phi", "orca-mini"]
}

def recommend_ollama_model(use_case: str, performance_tier: str = "medium") -> str:
    """Recommend an Ollama model based on use case and performance requirements."""
    if use_case in USE_CASE_RECOMMENDATIONS:
        models = USE_CASE_RECOMMENDATIONS[use_case]
        
        # Filter by performance tier if needed
        if performance_tier == "high":
            for model in models:
                if ":70b" in model or ":34b" in model:
                    return model
            return models[0]  # Return first if no high-tier match
        elif performance_tier == "low":
            return "orca-mini" if use_case != "code_generation" else "codellama"
        else:  # medium tier
            return models[0]
    
    # Default recommendations
    if performance_tier == "high":
        return "llama2:70b"
    elif performance_tier == "low":
        return "orca-mini"
    else:
        return "mistral"

Agent Adapter for Model Selection

Python
# app/agents/adaptive_agent.py
from typing import List, Dict, Any, Optional
import logging
from app.agents.base_agent import BaseAgent
from app.models.message import Message, MessageRole
from app.services.provider_service import ProviderService, Provider
from app.models.model_catalog import recommend_ollama_model, OLLAMA_MODELS

logger = logging.getLogger(__name__)

class AdaptiveAgent(BaseAgent):
    """Agent that adapts its model selection based on task requirements."""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.last_used_model = None
        self.last_used_provider = None
        self.performance_metrics = {}
    
    async def _generate_response(self, user_id: str) -> str:
        """Generate a response with dynamic model selection."""
        # Extract the last user message
        last_user_message = next(
            (msg for msg in reversed(self.state.conversation_history) 
             if msg.role == MessageRole.USER), 
            None
        )
        
        if not last_user_message:
            return "I don't have any messages to respond to."
        
        # Analyze the message to determine the best model
        provider, model = await self._select_optimal_model(last_user_message.content)
        
        logger.info(f"Selected model for response: {provider}:{model}")
        
        # Track the selected model for monitoring
        self.last_used_model = model
        self.last_used_provider = provider
        
        # Get model-specific parameters
        params = self._get_model_parameters(provider, model)
        
        # Start timing for performance metrics
        import time
        start_time = time.time()
        
        # Generate the response
        response = await self.provider_service.generate_completion(
            messages=[msg.model_dump() for msg in self.state.conversation_history],
            model=f"{provider}:{model}" if provider != "auto" else None,
            provider=provider,
            tools=self.tools,
            temperature=params.get("temperature", 0.7),
            max_tokens=params.get("max_tokens"),
            user=user_id
        )
        
        # Record performance metrics
        execution_time = time.time() - start_time
        self._update_performance_metrics(provider, model, execution_time, response)
        
        if response.get("tool_calls"):
            # Process tool calls if needed
            # ... (tool call handling code)
            pass
        
        return response["message"]["content"]
    
    async def _select_optimal_model(self, message: str) -> tuple[str, str]:
        """Select the optimal model based on message analysis."""
        # 1. Analyze for use case
        use_case = await self._determine_use_case(message)
        
        # 2. Determine performance needs
        performance_tier = self._determine_performance_tier(message)
        
        # 3. Check if tools are required
        tools_required = len(self.tools) > 0
        
        # 4. Check message complexity
        is_complex = await self._is_complex_request(message)
        
        # Decision logic
        if tools_required:
            # OpenAI is better for tool usage
            return "openai", "gpt-4o"
        
        if is_complex:
            # For complex requests, prefer OpenAI or high-tier Ollama models
            if performance_tier == "high":
                return "openai", "gpt-4o"
            else:
                ollama_model = recommend_ollama_model(use_case, "high")
                return "ollama", ollama_model
        
        # For standard requests, use Ollama with appropriate model
        ollama_model = recommend_ollama_model(use_case, performance_tier)
        return "ollama", ollama_model
    
    async def _determine_use_case(self, message: str) -> str:
        """Determine the use case based on message content."""
        message_lower = message.lower()
        
        # Simple heuristic classification
        if any(term in message_lower for term in ["code", "program", "function", "class", "algorithm"]):
            return "code_generation"
        
        if any(term in message_lower for term in ["story", "creative", "imagine", "write", "novel"]):
            return "creative_writing"
        
        if any(term in message_lower for term in ["math", "calculate", "equation", "solve", "formula"]):
            return "mathematical_reasoning"
        
        if any(term in message_lower for term in ["chat", "talk", "discuss", "conversation"]):
            return "conversational"
        
        if len(message.split()) > 50 or any(term in message_lower for term in ["explain", "detail", "analysis"]):
            return "knowledge_intensive"
        
        # Default to conversational
        return "conversational"
    
    def _determine_performance_tier(self, message: str) -> str:
        """Determine the performance tier needed based on message characteristics."""
        # Length-based heuristic
        word_count = len(message.split())
        
        if word_count > 100 or "detailed" in message.lower() or "comprehensive" in message.lower():
            return "high"
        
        if word_count < 20 and not any(term in message.lower() for term in ["complex", "difficult", "advanced"]):
            return "low"
        
        return "medium"
    
    async def _is_complex_request(self, message: str) -> bool:
        """Determine if this is a complex request requiring more powerful models."""
        # Check for indicators of complexity
        complexity_indicators = [
            "complex", "detailed", "thorough", "comprehensive", "in-depth",
            "analyze", "compare", "synthesize", "evaluate", "technical",
            "step by step", "advanced", "sophisticated", "nuanced"
        ]
        
        indicator_count = sum(1 for indicator in complexity_indicators if indicator in message.lower())
        
        # Length is also an indicator of complexity
        is_long = len(message.split()) > 50
        
        # Multiple questions indicate complexity
        question_count = message.count("?")
        has_multiple_questions = question_count > 1
        
        return (indicator_count >= 2) or (is_long and indicator_count >= 1) or has_multiple_questions
    
    def _get_model_parameters(self, provider: str, model: str) -> Dict[str, Any]:
        """Get model-specific parameters."""
        if provider == "ollama":
            if model in OLLAMA_MODELS:
                capabilities = OLLAMA_MODELS[model]
                return {
                    "temperature": capabilities.recommended_temperature,
                    "max_tokens": capabilities.context_window // 2  # Conservative estimate
                }
            else:
                # Default Ollama parameters
                return {"temperature": 0.7, "max_tokens": 2048}
        else:
            # OpenAI models
            if "gpt-4" in model:
                return {"temperature": 0.7, "max_tokens": 4096}
            else:
                return {"temperature": 0.7, "max_tokens": 2048}
    
    def _update_performance_metrics(
        self, 
        provider: str, 
        model: str, 
        execution_time: float,
        response: Dict[str, Any]
    ):
        """Update performance metrics for this model."""
        model_key = f"{provider}:{model}"
        
        if model_key not in self.performance_metrics:
            self.performance_metrics[model_key] = {
                "calls": 0,
                "total_time": 0,
                "avg_time": 0,
                "token_usage": {
                    "prompt": 0,
                    "completion": 0,
                    "total": 0
                }
            }
        
        metrics = self.performance_metrics[model_key]
        metrics["calls"] += 1
        metrics["total_time"] += execution_time
        metrics["avg_time"] = metrics["total_time"] / metrics["calls"]
        
        # Update token usage if available
        if "usage" in response:
            usage = response["usage"]
            metrics["token_usage"]["prompt"] += usage.get("prompt_tokens", 0)
            metrics["token_usage"]["completion"] += usage.get("completion_tokens", 0)
            metrics["token_usage"]["total"] += usage.get("total_tokens", 0)

Agent Controller with Model Selection

Python
# app/controllers/agent_controller.py
from fastapi import APIRouter, Depends, HTTPException, Query, BackgroundTasks
from pydantic import BaseModel, Field
from typing import List, Dict, Any, Optional
import logging

from app.agents.agent_factory import AgentFactory
from app.agents.adaptive_agent import AdaptiveAgent
from app.services.provider_service import Provider
from app.services.auth_service import get_current_user
from app.config import settings

logger = logging.getLogger(__name__)

router = APIRouter(prefix="/api/v1/agents", tags=["agents"])

class ModelSelectionParams(BaseModel):
    """Parameters for model selection."""
    provider: Optional[str] = Field(None, description="Provider to use (openai, ollama, auto)")
    model: Optional[str] = Field(None, description="Specific model to use")
    auto_select: bool = Field(True, description="Whether to auto-select the optimal model")
    use_case: Optional[str] = Field(None, description="Specific use case for model recommendation")
    performance_tier: Optional[str] = Field("medium", description="Performance tier (low, medium, high)")

class ChatRequest(BaseModel):
    message: str
    session_id: Optional[str] = None
    model_params: Optional[ModelSelectionParams] = None
    stream: bool = False

class ChatResponse(BaseModel):
    response: str
    session_id: str
    model_used: str
    provider_used: str
    execution_metrics: Optional[Dict[str, Any]] = None

# Agent sessions storage
agent_sessions = {}

# Get agent factory instance
agent_factory = Depends(lambda: get_agent_factory())

def get_agent_factory():
    # Initialize and return agent factory
    # In a real implementation, this would be properly initialized
    return AgentFactory()

@router.post("/chat", response_model=ChatResponse)
async def chat(
    request: ChatRequest,
    background_tasks: BackgroundTasks,
    current_user: Dict = Depends(get_current_user),
    factory: AgentFactory = agent_factory
):
    """Chat with an agent that intelligently selects the appropriate model."""
    user_id = current_user["id"]
    
    # Create or retrieve session
    session_id = request.session_id
    if not session_id or session_id not in agent_sessions:
        # Create a new adaptive agent
        agent = factory.create_agent(
            agent_type="adaptive",
            agent_class=AdaptiveAgent,
            system_prompt="You are a helpful assistant that provides accurate, relevant information."
        )
        
        session_id = f"session_{user_id}_{len(agent_sessions) + 1}"
        agent_sessions[session_id] = agent
    else:
        agent = agent_sessions[session_id]
    
    # Apply model selection parameters if provided
    if request.model_params:
        if not request.model_params.auto_select:
            # Force specific provider/model
            provider = request.model_params.provider or "auto"
            model = request.model_params.model
            
            if provider != "auto" and model:
                logger.info(f"Forcing model selection: {provider}:{model}")
                # Set for next generation
                agent.last_used_provider = provider
                agent.last_used_model = model
    
    try:
        # Process the message
        if request.stream:
            # Implement streaming logic if needed
            pass
        else:
            response = await agent.process_message(request.message, user_id)
            
            # Get the model and provider that were used
            model_used = agent.last_used_model or "unknown"
            provider_used = agent.last_used_provider or "unknown"
            
            # Get execution metrics
            model_key = f"{provider_used}:{model_used}"
            execution_metrics = agent.performance_metrics.get(model_key)
            
            # Schedule background task to analyze performance and adjust preferences
            background_tasks.add_task(
                analyze_performance, 
                agent, 
                model_key, 
                execution_metrics
            )
            
            return ChatResponse(
                response=response,
                session_id=session_id,
                model_used=model_used,
                provider_used=provider_used,
                execution_metrics=execution_metrics
            )
    except Exception as e:
        logger.exception(f"Error processing message: {str(e)}")
        raise HTTPException(status_code=500, detail=f"Error processing message: {str(e)}")

@router.get("/models/recommend")
async def recommend_model(
    use_case: str = Query(..., description="The use case (code_generation, creative_writing, etc.)"),
    performance_tier: str = Query("medium", description="Performance tier (low, medium, high)"),
    current_user: Dict = Depends(get_current_user)
):
    """Get model recommendations for a specific use case."""
    from app.models.model_catalog import recommend_ollama_model, OLLAMA_MODELS
    
    # Get recommended Ollama model
    recommended_model = recommend_ollama_model(use_case, performance_tier)
    
    # Get OpenAI equivalent
    openai_equivalent = "gpt-4o" if performance_tier == "high" else "gpt-3.5-turbo"
    
    # Get model capabilities if available
    capabilities = OLLAMA_MODELS.get(recommended_model, {})
    
    return {
        "ollama_recommendation": recommended_model,
        "openai_recommendation": openai_equivalent,
        "capabilities": capabilities,
        "use_case": use_case,
        "performance_tier": performance_tier
    }

async def analyze_performance(agent, model_key, metrics):
    """Analyze model performance and adjust preferences."""
    if not metrics or metrics["calls"] < 5:
        # Not enough data to analyze
        return
    
    # Analyze average response time
    avg_time = metrics["avg_time"]
    
    # If response time is too slow, consider adjusting default models
    if avg_time > 5.0:  # More than 5 seconds
        logger.info(f"Model {model_key} showing slow performance: {avg_time}s avg")
        
        # In a real implementation, we might adjust preferred models here
        pass

Dockerfile for Local Deployment

Dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set up environment
ENV PYTHONPATH=/app
ENV OPENAI_API_KEY="your-api-key-here"
ENV OLLAMA_HOST="http://ollama:11434"
ENV OLLAMA_MODEL="llama2"

# Default command
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Docker Compose for Development

YAML
# docker-compose.yml
version: '3.8'

services:
  app:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - .:/app
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENAI_MODEL=${OPENAI_MODEL:-gpt-4o}
      - OLLAMA_MODEL=${OLLAMA_MODEL:-llama2}
    depends_on:
      - ollama
    restart: unless-stopped

  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  ollama_data:

Model Preload Script

Python
# scripts/preload_models.py
#!/usr/bin/env python
import argparse
import requests
import time
import sys
import os
from typing import List, Dict

def main():
    parser = argparse.ArgumentParser(description='Preload Ollama models')
    parser.add_argument('--host', default="http://localhost:11434", help='Ollama host URL')
    parser.add_argument('--models', default="llama2,mistral,codellama", help='Comma-separated list of models to preload')
    parser.add_argument('--timeout', type=int, default=3600, help='Timeout in seconds for each model pull')
    args = parser.parse_args()

    models = [m.strip() for m in args.models.split(',')]
    preload_models(args.host, models, args.timeout)

def preload_models(host: str, models: List[str], timeout: int):
    """Preload models into Ollama."""
    print(f"Preloading {len(models)} models on {host}...")
    
    # Check Ollama availability
    try:
        response = requests.get(f"{host}/api/tags")
        if response.status_code != 200:
            print(f"Error connecting to Ollama: Status {response.status_code}")
            sys.exit(1)
            
        available_models = [m["name"] for m in response.json().get("models", [])]
        print(f"Currently available models: {', '.join(available_models)}")
    except Exception as e:
        print(f"Error connecting to Ollama: {str(e)}")
        sys.exit(1)
    
    # Pull each model
    for model in models:
        if model in available_models:
            print(f"Model {model} is already available, skipping...")
            continue
            
        print(f"Pulling model: {model}")
        try:
            start_time = time.time()
            response = requests.post(
                f"{host}/api/pull", 
                json={"name": model},
                timeout=timeout
            )
            
            if response.status_code != 200:
                print(f"Error pulling model {model}: Status {response.status_code}")
                print(response.text)
                continue
                
            elapsed = time.time() - start_time
            print(f"Successfully pulled {model} in {elapsed:.1f} seconds")
        except Exception as e:
            print(f"Error pulling model {model}: {str(e)}")
    
    # Verify available models after pulling
    try:
        response = requests.get(f"{host}/api/tags")
        if response.status_code == 200:
            available_models = [m["name"] for m in response.json().get("models", [])]
            print(f"Available models: {', '.join(available_models)}")
    except Exception as e:
        print(f"Error checking available models: {str(e)}")

if __name__ == "__main__":
    main()

Implementation Guide

Setting up Ollama

Installation:

Bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download/windows

Pull Base Models:

Bash
ollama pull llama2
ollama pull mistral
ollama pull codellama

Start Ollama Server:
```
Bash
ollama serve
```

Application Configuration

Create .env file:

OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=org-...  # Optional
OPENAI_MODEL=gpt-4o
OLLAMA_MODEL=llama2
OLLAMA_HOST=http://localhost:11434
COMPLEXITY_THRESHOLD=0.65
PRIVACY_SENSITIVE_TOKENS=password,secret,token,key,credential

Initialize Application:

Bash
# Install dependencies
pip install -r requirements.txt

# Start the application
uvicorn app.main:app --reload

Model Selection Criteria

The system determines which provider (OpenAI or Ollama) to use based on several criteria:

Complexity Analysis:
- Messages are analyzed for complexity based on length, specialized terminology, and sentence structure.
- The COMPLEXITY_THRESHOLD setting (default: 0.65) determines when to route to OpenAI for more complex queries.
Privacy Concerns:
- Messages containing sensitive terms (configured in PRIVACY_SENSITIVE_TOKENS) are preferentially routed to Ollama.
- This ensures sensitive information remains on local infrastructure.
Tool Requirements:
- Requests requiring tools/functions are routed to OpenAI as Ollama has limited native tool support.
- The system simulates tool usage in Ollama using prompt engineering when necessary.
Resource Constraints:
- Token budget constraints can trigger routing to OpenAI for longer conversations.
- Local hardware capabilities are considered when selecting Ollama models.

Ollama Model Selection

The system intelligently selects the appropriate Ollama model based on the query's requirements:

For code generation: codellama (default) or codellama:34b (high performance)
For creative tasks: dolphin-mistral or neural-chat
For mathematical reasoning: wizard-math
For general knowledge: llama2 (base), llama2:13b (medium), or llama2:70b (high performance)
For resource-constrained environments: phi or orca-mini

Performance Optimization

Response Caching:
- Common responses are cached to improve performance.
- Cache TTL and maximum items are configurable.
Dynamic Temperature Adjustment:
- Each model has recommended temperature settings for optimal performance.
- The system adjusts temperature based on the task type.
Adaptive Routing:
- The system learns from performance metrics and adjusts routing preferences over time.
- Models with consistently poor performance receive fewer requests.

Fallback Mechanisms

The system implements robust fallback mechanisms:

Provider Fallback:
- If OpenAI is unavailable, the system falls back to Ollama.
- If Ollama fails, the system falls back to OpenAI.
Model Fallback:
- If a requested model is unavailable, the system selects an appropriate alternative.
- Fallback chains are configured for each model to ensure graceful degradation.
Error Handling:
- Network errors, timeout issues, and model limitations are handled gracefully.
- The system provides informative error messages when fallbacks are exhausted.

Conclusion

The integration of Ollama with OpenAI's Agent SDK creates a sophisticated hybrid architecture that combines the strengths of both local and cloud-based inference. This implementation provides:

Enhanced privacy by keeping sensitive information local when appropriate
Cost optimization by routing suitable queries to local infrastructure
Robust fallbacks ensuring system resilience against failures
Task-appropriate model selection based on sophisticated analysis
Seamless integration with the agent framework and tools ecosystem

This architecture represents a significant advancement in responsible AI deployment, balancing the power of cloud-based models with the privacy and cost benefits of local inference. By intelligently routing requests based on their characteristics, the system provides optimal performance while respecting critical constraints around privacy, latency, and resource utilization.

Comprehensive Testing Strategy for OpenAI-Ollama Hybrid Agent System

Theoretical Framework for Validation Methodology

The integration of cloud-based and local inferencing capabilities within a unified agent architecture necessitates a multifaceted testing approach that encompasses both individual components and their systemic interactions. This document establishes a rigorous testing framework that addresses the unique challenges of validating a hybrid AI system across multiple dimensions of functionality, performance, and reliability.

Strategic Testing Layers

1. Unit Testing Framework

Core Component Isolation Testing

Python
# tests/unit/test_provider_service.py
import pytest
import asyncio
from unittest.mock import AsyncMock, patch, MagicMock
import json

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

class TestProviderService:
    @pytest.fixture
    def provider_service(self):
        """Create a provider service with mocked dependencies for testing."""
        service = ProviderService()
        service.openai_client = AsyncMock()
        service.ollama_service = AsyncMock(spec=OllamaService)
        return service
    
    @pytest.mark.asyncio
    async def test_select_provider_and_model_explicit(self, provider_service):
        """Test explicit provider and model selection."""
        # Test explicit provider:model format
        provider, model = await provider_service._select_provider_and_model(
            messages=[{"role": "user", "content": "Hello"}],
            model="openai:gpt-4"
        )
        assert provider == Provider.OPENAI
        assert model == "gpt-4"
        
        # Test explicit provider with default model
        provider, model = await provider_service._select_provider_and_model(
            messages=[{"role": "user", "content": "Hello"}],
            provider="ollama"
        )
        assert provider == Provider.OLLAMA
        assert model == provider_service.default_ollama_model
    
    @pytest.mark.asyncio
    async def test_auto_routing_complex_content(self, provider_service):
        """Test auto-routing with complex content."""
        # Mock complexity assessment to return high complexity
        provider_service._assess_complexity = AsyncMock(return_value=0.8)
        provider_service.model_selection_criteria.complexity_threshold = 0.7
        
        provider = await provider_service._auto_route(
            messages=[{"role": "user", "content": "Complex technical question"}]
        )
        
        assert provider == Provider.OPENAI
        provider_service._assess_complexity.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_auto_routing_privacy_sensitive(self, provider_service):
        """Test auto-routing with privacy sensitive content."""
        provider_service.model_selection_criteria.privacy_sensitive_tokens = ["password", "secret"]
        
        provider = await provider_service._auto_route(
            messages=[{"role": "user", "content": "What is my password?"}]
        )
        
        assert provider == Provider.OLLAMA
    
    @pytest.mark.asyncio
    async def test_auto_routing_with_tools(self, provider_service):
        """Test auto-routing with tool requirements."""
        provider = await provider_service._auto_route(
            messages=[{"role": "user", "content": "Simple question"}],
            tools=[{"type": "function", "function": {"name": "get_weather"}}]
        )
        
        assert provider == Provider.OPENAI
    
    @pytest.mark.asyncio
    async def test_generate_completion_openai(self, provider_service):
        """Test generating completion with OpenAI."""
        # Setup mock response
        mock_response = MagicMock()
        mock_response.model_dump.return_value = {
            "id": "test-id",
            "object": "chat.completion",
            "model": "gpt-4",
            "usage": {"total_tokens": 10},
            "message": {"content": "Test response"}
        }
        provider_service.openai_client.chat.completions.create = AsyncMock(return_value=mock_response)
        
        response = await provider_service._generate_openai_completion(
            messages=[{"role": "user", "content": "Hello"}],
            model="gpt-4"
        )
        
        assert response["message"]["content"] == "Test response"
        provider_service.openai_client.chat.completions.create.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_generate_completion_ollama(self, provider_service):
        """Test generating completion with Ollama."""
        provider_service.ollama_service.generate_completion.return_value = {
            "id": "ollama-test",
            "model": "llama2",
            "provider": "ollama",
            "message": {"content": "Ollama response"}
        }
        
        response = await provider_service._generate_ollama_completion(
            messages=[{"role": "user", "content": "Hello"}],
            model="llama2"
        )
        
        assert response["message"]["content"] == "Ollama response"
        provider_service.ollama_service.generate_completion.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_fallback_mechanism(self, provider_service):
        """Test fallback mechanism when primary provider fails."""
        # Mock the primary provider (OpenAI) to fail
        provider_service._generate_openai_completion = AsyncMock(side_effect=Exception("API error"))
        
        # Mock the fallback provider (Ollama) to succeed
        provider_service._generate_ollama_completion = AsyncMock(return_value={
            "id": "ollama-fallback",
            "provider": "ollama",
            "message": {"content": "Fallback response"}
        })
        
        # Test the generate_completion method with auto provider
        response = await provider_service.generate_completion(
            messages=[{"role": "user", "content": "Hello"}],
            provider="auto"
        )
        
        # Check that fallback was used
        assert response["provider"] == "ollama"
        assert response["message"]["content"] == "Fallback response"
        provider_service._generate_openai_completion.assert_called_once()
        provider_service._generate_ollama_completion.assert_called_once()

Model Selection Logic Testing

Python
# tests/unit/test_model_selection.py
import pytest
from unittest.mock import AsyncMock, patch
import json

from app.models.model_catalog import recommend_ollama_model, OLLAMA_MODELS
from app.agents.adaptive_agent import AdaptiveAgent

class TestModelSelection:
    @pytest.mark.parametrize("use_case,performance_tier,expected_model", [
        ("code_generation", "high", "codellama:34b"),
        ("creative_writing", "medium", "dolphin-mistral"),
        ("mathematical_reasoning", "low", "orca-mini"),
        ("conversational", "high", "neural-chat"),
        ("knowledge_intensive", "high", "llama2:70b"),
        ("resource_constrained", "low", "phi"),
    ])
    def test_model_recommendations(self, use_case, performance_tier, expected_model):
        """Test model recommendation logic for different use cases."""
        model = recommend_ollama_model(use_case, performance_tier)
        assert model == expected_model
    
    @pytest.mark.asyncio
    async def test_adaptive_agent_use_case_detection(self):
        """Test adaptive agent's use case detection logic."""
        provider_service = AsyncMock()
        agent = AdaptiveAgent(
            provider_service=provider_service,
            system_prompt="You are a helpful assistant."
        )
        
        # Test code-related message
        code_use_case = await agent._determine_use_case(
            "Can you help me write a Python function to calculate Fibonacci numbers?"
        )
        assert code_use_case == "code_generation"
        
        # Test creative writing message
        creative_use_case = await agent._determine_use_case(
            "Write a short story about a robot discovering emotions."
        )
        assert creative_use_case == "creative_writing"
        
        # Test mathematical reasoning message
        math_use_case = await agent._determine_use_case(
            "Solve this equation: 3x² + 2x - 5 = 0"
        )
        assert math_use_case == "mathematical_reasoning"
    
    @pytest.mark.asyncio
    async def test_complexity_assessment(self):
        """Test complexity assessment logic."""
        provider_service = AsyncMock()
        agent = AdaptiveAgent(
            provider_service=provider_service,
            system_prompt="You are a helpful assistant."
        )
        
        # Simple message
        simple_message = "What time is it?"
        is_complex_simple = await agent._is_complex_request(simple_message)
        assert not is_complex_simple
        
        # Complex message
        complex_message = "Can you provide a detailed analysis of the socioeconomic factors that contributed to the Industrial Revolution in England, and compare those with the conditions in contemporary developing economies?"
        is_complex_detailed = await agent._is_complex_request(complex_message)
        assert is_complex_detailed
        
        # Multiple questions
        multi_question = "What is quantum computing? How does it differ from classical computing? What are its potential applications?"
        is_complex_multi = await agent._is_complex_request(multi_question)
        assert is_complex_multi

Ollama Service Testing

Python
# tests/unit/test_ollama_service.py
import pytest
import json
import asyncio
from unittest.mock import AsyncMock, patch, MagicMock

from app.services.ollama_service import OllamaService

class TestOllamaService:
    @pytest.fixture
    def ollama_service(self):
        """Create an Ollama service with mocked session for testing."""
        service = OllamaService()
        service.session = AsyncMock()
        return service
    
    @pytest.mark.asyncio
    async def test_list_models(self, ollama_service):
        """Test listing available models."""
        mock_response = AsyncMock()
        mock_response.status = 200
        mock_response.json = AsyncMock(return_value={"models": [
            {"name": "llama2"},
            {"name": "mistral"}
        ]})
        
        # Mock the context manager
        ollama_service.session.get = AsyncMock()
        ollama_service.session.get.return_value.__aenter__.return_value = mock_response
        
        models = await ollama_service.list_models()
        
        assert len(models) == 2
        assert models[0]["name"] == "llama2"
        assert models[1]["name"] == "mistral"
    
    @pytest.mark.asyncio
    async def test_generate_completion(self, ollama_service):
        """Test generating a completion."""
        # Mock the response
        mock_response = AsyncMock()
        mock_response.status = 200
        mock_response.json = AsyncMock(return_value={
            "id": "test-id",
            "response": "This is a test response",
            "created_at": 1677858242
        })
        
        # Mock the context manager
        ollama_service.session.post = AsyncMock()
        ollama_service.session.post.return_value.__aenter__.return_value = mock_response
        
        # Test the completion generation
        response = await ollama_service._generate_completion_sync({
            "model": "llama2",
            "prompt": "Hello, world!",
            "stream": False,
            "options": {"temperature": 0.7}
        })
        
        # Check the formatted response
        assert "message" in response
        assert response["message"]["content"] == "This is a test response"
        assert response["provider"] == "ollama"
    
    @pytest.mark.asyncio
    async def test_format_messages_for_ollama(self, ollama_service):
        """Test formatting messages for Ollama."""
        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello!"},
            {"role": "assistant", "content": "Hi there!"},
            {"role": "user", "content": "How are you?"}
        ]
        
        formatted = ollama_service._format_messages_for_ollama(messages)
        
        assert "[System]" in formatted
        assert "[User]" in formatted
        assert "[Assistant]" in formatted
        assert "You are a helpful assistant." in formatted
        assert "Hello!" in formatted
        assert "How are you?" in formatted
    
    @pytest.mark.asyncio
    async def test_tool_call_extraction(self, ollama_service):
        """Test extracting tool calls from response text."""
        # Response with a tool call
        response_with_tool = """
        I'll help you get the weather information.
        
        
        {
          "name": "get_weather",
          "parameters": {
            "location": "New York",
            "unit": "celsius"
          }
        }
        
        
        Let me check the weather for you.
        """
        
        tool_calls = ollama_service._extract_tool_calls(response_with_tool)
        
        assert tool_calls is not None
        assert len(tool_calls) == 1
        assert tool_calls[0]["function"]["name"] == "get_weather"
        assert "New York" in tool_calls[0]["function"]["arguments"]
        
        # Response without a tool call
        response_without_tool = "The weather in New York is sunny."
        assert ollama_service._extract_tool_calls(response_without_tool) is None
    
    @pytest.mark.asyncio
    async def test_clean_tool_calls_from_text(self, ollama_service):
        """Test cleaning tool calls from response text."""
        response_with_tool = """
        I'll help you get the weather information.
        
        
        {
          "name": "get_weather",
          "parameters": {
            "location": "New York",
            "unit": "celsius"
          }
        }
        
        
        Let me check the weather for you.
        """
        
        cleaned = ollama_service._clean_tool_calls_from_text(response_with_tool)
        
        assert "" not in cleaned
        assert "get_weather" not in cleaned
        assert "I'll help you get the weather information." in cleaned
        assert "Let me check the weather for you." in cleaned

Tool Integration Testing

Python
# tests/unit/test_tool_integration.py
import pytest
from unittest.mock import AsyncMock, patch
import json

from app.agents.task_agent import TaskManagementAgent
from app.models.message import Message, MessageRole

class TestToolIntegration:
    @pytest.fixture
    def task_agent(self):
        """Create a task agent with mocked services."""
        provider_service = AsyncMock()
        task_service = AsyncMock()
        
        agent = TaskManagementAgent(
            provider_service=provider_service,
            task_service=task_service,
            system_prompt="You are a task management agent."
        )
        
        return agent
    
    @pytest.mark.asyncio
    async def test_process_tool_calls_list_tasks(self, task_agent):
        """Test processing the list_tasks tool call."""
        # Mock task service response
        task_agent.task_service.list_tasks.return_value = [
            {
                "id": "task1",
                "title": "Complete report",
                "status": "pending",
                "priority": "high",
                "due_date": "2023-04-15",
                "description": "Finish quarterly report"
            }
        ]
        
        # Create a tool call for list_tasks
        tool_calls = [{
            "id": "call_123",
            "function": {
                "name": "list_tasks",
                "arguments": json.dumps({
                    "status": "pending",
                    "limit": 5
                })
            }
        }]
        
        # Process the tool calls
        tool_responses = await task_agent._process_tool_calls(tool_calls, "user123")
        
        # Verify the response
        assert len(tool_responses) == 1
        assert tool_responses[0]["tool_call_id"] == "call_123"
        assert "Complete report" in tool_responses[0]["content"]
        assert "pending" in tool_responses[0]["content"]
        
        # Verify service was called correctly
        task_agent.task_service.list_tasks.assert_called_once_with(
            user_id="user123",
            status="pending",
            limit=5
        )
    
    @pytest.mark.asyncio
    async def test_process_tool_calls_create_task(self, task_agent):
        """Test processing the create_task tool call."""
        # Mock task service response
        task_agent.task_service.create_task.return_value = {
            "id": "new_task",
            "title": "New test task"
        }
        
        # Create a tool call for create_task
        tool_calls = [{
            "id": "call_456",
            "function": {
                "name": "create_task",
                "arguments": json.dumps({
                    "title": "New test task",
                    "description": "This is a test task",
                    "priority": "medium"
                })
            }
        }]
        
        # Process the tool calls
        tool_responses = await task_agent._process_tool_calls(tool_calls, "user123")
        
        # Verify the response
        assert len(tool_responses) == 1
        assert tool_responses[0]["tool_call_id"] == "call_456"
        assert "Task created successfully" in tool_responses[0]["content"]
        assert "New test task" in tool_responses[0]["content"]
        
        # Verify service was called correctly
        task_agent.task_service.create_task.assert_called_once_with(
            user_id="user123",
            title="New test task",
            description="This is a test task",
            due_date=None,
            priority="medium"
        )
    
    @pytest.mark.asyncio
    async def test_generate_response_with_tools(self, task_agent):
        """Test the full generate_response flow with tool usage."""
        # Set up the conversation history
        task_agent.state.conversation_history = [
            Message(role=MessageRole.SYSTEM, content="You are a task management agent."),
            Message(role=MessageRole.USER, content="List my pending tasks")
        ]
        
        # Mock provider service to return a response with tool calls first
        mock_response_with_tools = {
            "message": {
                "content": "I'll list your tasks",
                "tool_calls": [{
                    "id": "call_123",
                    "function": {
                        "name": "list_tasks",
                        "arguments": json.dumps({
                            "status": "pending",
                            "limit": 10
                        })
                    }
                }]
            },
            "tool_calls": [{
                "id": "call_123",
                "function": {
                    "name": "list_tasks",
                    "arguments": json.dumps({
                        "status": "pending",
                        "limit": 10
                    })
                }
            }]
        }
        
        # Mock task service
        task_agent.task_service.list_tasks.return_value = [
            {
                "id": "task1",
                "title": "Complete report",
                "status": "pending",
                "priority": "high",
                "due_date": "2023-04-15",
                "description": "Finish quarterly report"
            }
        ]
        
        # Mock final response after tool processing
        mock_final_response = {
            "message": {
                "content": "You have 1 pending task: Complete report (high priority, due Apr 15)"
            }
        }
        
        # Set up the mocked provider service
        task_agent.provider_service.generate_completion = AsyncMock()
        task_agent.provider_service.generate_completion.side_effect = [
            mock_response_with_tools,  # First call returns tool calls
            mock_final_response        # Second call returns final response
        ]
        
        # Generate the response
        response = await task_agent._generate_response("user123")
        
        # Verify the final response
        assert response == "You have 1 pending task: Complete report (high priority, due Apr 15)"
        
        # Verify the provider service was called twice
        assert task_agent.provider_service.generate_completion.call_count == 2
        
        # Verify the task service was called
        task_agent.task_service.list_tasks.assert_called_once()
        
        # Verify tool response was added to conversation history
        tool_messages = [msg for msg in task_agent.state.conversation_history if msg.role == MessageRole.TOOL]
        assert len(tool_messages) == 1

2. Integration Testing Framework

API Endpoint Testing

Python
# tests/integration/test_api_endpoints.py
import pytest
from fastapi.testclient import TestClient
import json
import os
from unittest.mock import patch, AsyncMock

from app.main import app
from app.services.provider_service import ProviderService

client = TestClient(app)

class TestAPIEndpoints:
    @pytest.fixture(autouse=True)
    def setup_mocks(self):
        """Set up mocks for services."""
        # Patch the provider service
        with patch('app.controllers.agent_controller.get_agent_factory') as mock_factory:
            mock_provider = AsyncMock(spec=ProviderService)
            mock_factory.return_value.provider_service = mock_provider
            yield
    
    def test_health_endpoint(self):
        """Test the health check endpoint."""
        response = client.get("/api/health")
        assert response.status_code == 200
        assert response.json()["status"] == "ok"
    
    def test_chat_endpoint_auth_required(self):
        """Test that chat endpoint requires authentication."""
        response = client.post(
            "/api/v1/chat",
            json={"message": "Hello"}
        )
        assert response.status_code == 401  # Unauthorized
    
    def test_chat_endpoint_with_auth(self):
        """Test the chat endpoint with proper authentication."""
        # Mock the authentication
        with patch('app.services.auth_service.get_current_user') as mock_auth:
            mock_auth.return_value = {"id": "test_user"}
            
            # Mock the agent's process_message
            with patch('app.agents.base_agent.BaseAgent.process_message') as mock_process:
                mock_process.return_value = "Hello, I'm an AI assistant."
                
                response = client.post(
                    "/api/v1/chat",
                    json={"message": "Hi there"},
                    headers={"Authorization": "Bearer test_token"}
                )
                
                assert response.status_code == 200
                assert "response" in response.json()
                assert response.json()["response"] == "Hello, I'm an AI assistant."
    
    def test_model_recommendation_endpoint(self):
        """Test the model recommendation endpoint."""
        # Mock the authentication
        with patch('app.services.auth_service.get_current_user') as mock_auth:
            mock_auth.return_value = {"id": "test_user"}
            
            response = client.get(
                "/api/v1/agents/models/recommend?use_case=code_generation&performance_tier=high",
                headers={"Authorization": "Bearer test_token"}
            )
            
            assert response.status_code == 200
            data = response.json()
            assert "ollama_recommendation" in data
            assert data["use_case"] == "code_generation"
            assert data["performance_tier"] == "high"
    
    def test_streaming_endpoint(self):
        """Test the streaming endpoint."""
        # Mock the authentication
        with patch('app.services.auth_service.get_current_user') as mock_auth:
            mock_auth.return_value = {"id": "test_user"}
            
            # Mock the streaming generator
            async def mock_stream_generator():
                yield {"id": "1", "content": "Hello"}
                yield {"id": "2", "content": " World"}
            
            # Mock the stream method
            with patch('app.services.provider_service.ProviderService.stream_completion') as mock_stream:
                mock_stream.return_value = mock_stream_generator()
                
                response = client.post(
                    "/api/v1/chat/streaming",
                    json={"message": "Hi", "stream": True},
                    headers={"Authorization": "Bearer test_token"}
                )
                
                assert response.status_code == 200
                assert response.headers["content-type"] == "text/event-stream"
                
                # Parse the streaming response
                content = response.content.decode()
                assert "data:" in content
                assert "Hello" in content
                assert "World" in content

End-to-End Agent Flow Testing

Python
# tests/integration/test_agent_flows.py
import pytest
import asyncio
from unittest.mock import AsyncMock, patch
import json

from app.agents.meta_agent import MetaAgent, AgentSubsystem
from app.agents.research_agent import ResearchAgent
from app.agents.conversation_manager import ConversationManager
from app.models.message import Message, MessageRole

class TestAgentFlows:
    @pytest.fixture
    async def meta_agent_setup(self):
        """Set up a meta agent with subsystems for testing."""
        # Create mocked services
        provider_service = AsyncMock()
        knowledge_service = AsyncMock()
        memory_service = AsyncMock()
        
        # Create subsystem agents
        research_agent = ResearchAgent(
            provider_service=provider_service,
            knowledge_service=knowledge_service,
            system_prompt="You are a research agent."
        )
        
        conversation_agent = ConversationManager(
            provider_service=provider_service,
            system_prompt="You are a conversation management agent."
        )
        
        # Create meta agent
        meta_agent = MetaAgent(
            provider_service=provider_service,
            system_prompt="You are a meta agent that coordinates specialized agents."
        )
        
        # Add subsystems
        meta_agent.add_subsystem(AgentSubsystem(
            name="research",
            agent=research_agent,
            role="Knowledge retrieval specialist"
        ))
        
        meta_agent.add_subsystem(AgentSubsystem(
            name="conversation",
            agent=conversation_agent,
            role="Conversation flow manager"
        ))
        
        # Return the setup
        return {
            "meta_agent": meta_agent,
            "provider_service": provider_service,
            "knowledge_service": knowledge_service,
            "research_agent": research_agent,
            "conversation_agent": conversation_agent
        }
    
    @pytest.mark.asyncio
    async def test_meta_agent_routing(self, meta_agent_setup):
        """Test the meta agent's routing logic."""
        meta_agent = meta_agent_setup["meta_agent"]
        provider_service = meta_agent_setup["provider_service"]
        
        # Setup conversation history
        meta_agent.state.conversation_history = [
            Message(role=MessageRole.SYSTEM, content="You are a meta agent."),
            Message(role=MessageRole.USER, content="Tell me about quantum computing")
        ]
        
        # Mock the routing response to use research subsystem
        routing_response = {
            "message": {
                "content": "I'll route this to the research subsystem"
            },
            "tool_calls": [{
                "id": "call_123",
                "function": {
                    "name": "route_to_subsystem",
                    "arguments": json.dumps({
                        "subsystem": "research",
                        "task": "Tell me about quantum computing",
                        "context": {}
                    })
                }
            }]
        }
        
        # Mock the research agent's response
        research_response = "Quantum computing is a type of computing that uses quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data."
        meta_agent_setup["research_agent"].process_message = AsyncMock(return_value=research_response)
        
        # Mock the provider service responses
        provider_service.generate_completion.side_effect = [
            routing_response,  # First call for routing decision
        ]
        
        # Generate response
        response = await meta_agent._generate_response("user123")
        
        # Verify routing happened correctly
        assert "[research" in response
        assert "Quantum computing" in response
        
        # Verify the research agent was called
        meta_agent_setup["research_agent"].process_message.assert_called_once_with(
            "Tell me about quantum computing", "user123"
        )
    
    @pytest.mark.asyncio
    async def test_meta_agent_parallel_processing(self, meta_agent_setup):
        """Test the meta agent's parallel processing logic."""
        meta_agent = meta_agent_setup["meta_agent"]
        provider_service = meta_agent_setup["provider_service"]
        
        # Setup conversation history
        meta_agent.state.conversation_history = [
            Message(role=MessageRole.SYSTEM, content="You are a meta agent."),
            Message(role=MessageRole.USER, content="Explain the impacts of AI on society")
        ]
        
        # Mock the routing response to use parallel processing
        routing_response = {
            "message": {
                "content": "I'll process this with multiple subsystems"
            },
            "tool_calls": [{
                "id": "call_456",
                "function": {
                    "name": "parallel_processing",
                    "arguments": json.dumps({
                        "task": "Explain the impacts of AI on society",
                        "subsystems": ["research", "conversation"]
                    })
                }
            }]
        }
        
        # Mock each agent's response
        research_response = "From a research perspective, AI impacts society through automation, economic transformation, and ethical considerations."
        conversation_response = "From a conversational perspective, AI is changing how we interact with technology and each other."
        
        meta_agent_setup["research_agent"].process_message = AsyncMock(return_value=research_response)
        meta_agent_setup["conversation_agent"].process_message = AsyncMock(return_value=conversation_response)
        
        # Mock synthesis response
        synthesis_response = {
            "message": {
                "content": "AI has multifaceted impacts on society. From a research perspective, it drives automation and economic transformation. From a conversational perspective, it changes human-technology interaction patterns."
            }
        }
        
        # Mock the provider service responses
        provider_service.generate_completion.side_effect = [
            routing_response,    # First call for routing decision
            synthesis_response   # Second call for synthesis
        ]
        
        # Generate response
        response = await meta_agent._generate_response("user123")
        
        # Verify synthesis happened correctly
        assert "multifaceted impacts" in response
        assert provider_service.generate_completion.call_count == 2
        
        # Verify both agents were called
        meta_agent_setup["research_agent"].process_message.assert_called_once()
        meta_agent_setup["conversation_agent"].process_message.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_research_agent_knowledge_retrieval(self, meta_agent_setup):
        """Test the research agent's knowledge retrieval capabilities."""
        research_agent = meta_agent_setup["research_agent"]
        provider_service = meta_agent_setup["provider_service"]
        knowledge_service = meta_agent_setup["knowledge_service"]
        
        # Setup conversation history
        research_agent.state.conversation_history = [
            Message(role=MessageRole.SYSTEM, content="You are a research agent."),
            Message(role=MessageRole.USER, content="What are the latest developments in fusion energy?")
        ]
        
        # Mock knowledge retrieval results
        knowledge_service.search.return_value = [
            {
                "id": "doc1",
                "title": "Recent Fusion Breakthrough",
                "content": "Scientists achieved net energy gain in fusion reaction at NIF in December 2022.",
                "relevance_score": 0.95
            },
            {
                "id": "doc2",
                "title": "Commercial Fusion Startups",
                "content": "Several startups including Commonwealth Fusion Systems are working on commercial fusion reactors.",
                "relevance_score": 0.89
            }
        ]
        
        # Mock initial response with tool calls
        tool_call_response = {
            "message": {
                "content": "Let me search for information on fusion energy."
            },
            "tool_calls": [{
                "id": "call_789",
                "function": {
                    "name": "search_knowledge_base",
                    "arguments": json.dumps({
                        "query": "latest developments fusion energy",
                        "max_results": 3
                    })
                }
            }]
        }
        
        # Mock final response with knowledge incorporated
        final_response = {
            "message": {
                "content": "Recent developments in fusion energy include a breakthrough at NIF in December 2022 achieving net energy gain, and advances from startups like Commonwealth Fusion Systems working on commercial reactors."
            }
        }
        
        # Mock the provider service responses
        provider_service.generate_completion.side_effect = [
            tool_call_response,  # First call with tool request
            final_response       # Second call with knowledge incorporated
        ]
        
        # Generate response
        response = await research_agent._generate_response("user123")
        
        # Verify response includes knowledge
        assert "NIF" in response
        assert "Commonwealth Fusion Systems" in response
        
        # Verify knowledge service was called
        knowledge_service.search.assert_called_once_with(
            query="latest developments fusion energy",
            max_results=3
        )

Cross-Provider Integration Testing

Python
# tests/integration/test_cross_provider.py
import pytest
import os
from unittest.mock import patch, AsyncMock
import json

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

class TestCrossProviderIntegration:
    @pytest.fixture
    async def real_services(self):
        """Set up real services for integration testing."""
        # Skip tests if API keys aren't available in the environment
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize real services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        # Initialize the services
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    @pytest.mark.asyncio
    async def test_provider_selection_complex_query(self, real_services):
        """Test that complex queries route to OpenAI."""
        provider_service = real_services["provider_service"]
        
        # Adjust complexity threshold to ensure predictable routing
        provider_service.model_selection_criteria.complexity_threshold = 0.5
        
        # Complex query that should route to OpenAI
        complex_messages = [
            {"role": "user", "content": "Provide a detailed analysis of the philosophical implications of artificial general intelligence, considering perspectives from epistemology, ethics, and metaphysics."}
        ]
        
        # Select provider
        provider, model = await provider_service._select_provider_and_model(
            messages=complex_messages,
            provider="auto"
        )
        
        # Verify routing decision
        assert provider == Provider.OPENAI
    
    @pytest.mark.asyncio
    async def test_provider_selection_simple_query(self, real_services):
        """Test that simple queries route to Ollama."""
        provider_service = real_services["provider_service"]
        
        # Adjust complexity threshold to ensure predictable routing
        provider_service.model_selection_criteria.complexity_threshold = 0.5
        
        # Simple query that should route to Ollama
        simple_messages = [
            {"role": "user", "content": "What's the weather like today?"}
        ]
        
        # Select provider
        provider, model = await provider_service._select_provider_and_model(
            messages=simple_messages,
            provider="auto"
        )
        
        # Verify routing decision
        assert provider == Provider.OLLAMA
    
    @pytest.mark.asyncio
    async def test_fallback_mechanism_real(self, real_services):
        """Test the fallback mechanism with real services."""
        provider_service = real_services["provider_service"]
        
        # Intentionally cause OpenAI to fail by using an invalid model
        messages = [
            {"role": "user", "content": "Simple test message"}
        ]
        
        try:
            # This should fail with OpenAI but succeed with Ollama fallback
            response = await provider_service.generate_completion(
                messages=messages,
                model="openai:non-existent-model",  # Invalid model
                provider="auto"  # Enable auto-fallback
            )
            
            # If we get here, fallback worked
            assert response["provider"] == "ollama"
            assert "content" in response["message"]
        except Exception as e:
            pytest.fail(f"Fallback mechanism failed: {str(e)}")
    
    @pytest.mark.asyncio
    async def test_ollama_response_format(self, real_services):
        """Test that Ollama responses are properly formatted to match OpenAI's structure."""
        ollama_service = real_services["ollama_service"]
        
        # Generate a basic response
        messages = [
            {"role": "user", "content": "What is 2+2?"}
        ]
        
        response = await ollama_service.generate_completion(
            messages=messages,
            model="llama2"  # Specify a model that should exist
        )
        
        # Verify response structure matches expected format
        assert "id" in response
        assert "object" in response
        assert "model" in response
        assert "usage" in response
        assert "message" in response
        assert "content" in response["message"]
        assert response["provider"] == "ollama"

3. Performance Testing Framework

Response Latency Benchmarking

Python
# tests/performance/test_latency.py
import pytest
import time
import asyncio
import statistics
from typing import List, Dict, Any
import pandas as pd
import matplotlib.pyplot as plt
import os

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

# Skip tests if it's CI environment
SKIP_PERFORMANCE_TESTS = os.environ.get("CI") == "true"

@pytest.mark.skipif(SKIP_PERFORMANCE_TESTS, reason="Performance tests skipped in CI environment")
class TestResponseLatency:
    @pytest.fixture
    async def services(self):
        """Set up services for latency testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    async def measure_latency(self, provider_service, provider, model, messages):
        """Measure response latency for a given provider and model."""
        start_time = time.time()
        
        if provider == "openai":
            await provider_service._generate_openai_completion(
                messages=messages,
                model=model
            )
        else:  # ollama
            await provider_service._generate_ollama_completion(
                messages=messages,
                model=model
            )
            
        end_time = time.time()
        return end_time - start_time
    
    @pytest.mark.asyncio
    async def test_latency_comparison(self, services):
        """Compare latency between OpenAI and Ollama for different query types."""
        provider_service = services["provider_service"]
        
        # Test messages of different complexity
        test_messages = [
            {
                "name": "simple_factual",
                "messages": [{"role": "user", "content": "What is the capital of France?"}]
            },
            {
                "name": "medium_explanation",
                "messages": [{"role": "user", "content": "Explain how photosynthesis works in plants."}]
            },
            {
                "name": "complex_analysis",
                "messages": [{"role": "user", "content": "Analyze the economic factors that contributed to the 2008 financial crisis and their long-term impacts."}]
            }
        ]
        
        # Models to test
        models = {
            "openai": ["gpt-3.5-turbo", "gpt-4"],
            "ollama": ["llama2", "mistral"]
        }
        
        # Number of repetitions for each test
        repetitions = 3
        
        # Collect results
        results = []
        
        for message_type in test_messages:
            for provider in models:
                for model in models[provider]:
                    for i in range(repetitions):
                        try:
                            latency = await self.measure_latency(
                                provider_service, 
                                provider, 
                                model, 
                                message_type["messages"]
                            )
                            
                            results.append({
                                "provider": provider,
                                "model": model,
                                "message_type": message_type["name"],
                                "repetition": i,
                                "latency": latency
                            })
                            
                            # Add a small delay to avoid rate limits
                            await asyncio.sleep(1)
                        except Exception as e:
                            print(f"Error testing {provider}:{model} - {str(e)}")
        
        # Analyze results
        df = pd.DataFrame(results)
        
        # Calculate average latency by provider, model, and message type
        avg_latency = df.groupby(['provider', 'model', 'message_type'])['latency'].mean().reset_index()
        
        # Generate summary statistics
        summary = avg_latency.pivot_table(
            index=['provider', 'model'],
            columns='message_type',
            values='latency'
        ).reset_index()
        
        # Print summary
        print("\nLatency Benchmark Results (seconds):")
        print(summary)
        
        # Create visualization
        plt.figure(figsize=(12, 8))
        
        for message_type in test_messages:
            subset = avg_latency[avg_latency['message_type'] == message_type['name']]
            x = range(len(subset))
            labels = [f"{row['provider']}\n{row['model']}" for _, row in subset.iterrows()]
            
            plt.subplot(1, len(test_messages), test_messages.index(message_type) + 1)
            plt.bar(x, subset['latency'])
            plt.xticks(x, labels, rotation=45)
            plt.title(f"Latency: {message_type['name']}")
            plt.ylabel("Seconds")
        
        plt.tight_layout()
        plt.savefig('latency_benchmark.png')
        
        # Assert something meaningful
        assert len(results) > 0, "No benchmark results collected"

Memory Usage Monitoring

Python
# tests/performance/test_memory_usage.py
import pytest
import os
import asyncio
import psutil
import time
import resource
import matplotlib.pyplot as plt
import pandas as pd
from typing import List, Dict, Any

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

# Skip tests if it's CI environment
SKIP_PERFORMANCE_TESTS = os.environ.get("CI") == "true"

@pytest.mark.skipif(SKIP_PERFORMANCE_TESTS, reason="Performance tests skipped in CI environment")
class TestMemoryUsage:
    @pytest.fixture
    async def services(self):
        """Set up services for memory testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    def get_memory_usage(self):
        """Get current memory usage of the process."""
        process = psutil.Process(os.getpid())
        memory_info = process.memory_info()
        return memory_info.rss / (1024 * 1024)  # Convert to MB
    
    async def monitor_memory_during_request(self, provider_service, provider, model, messages):
        """Monitor memory usage during a request."""
        memory_samples = []
        
        # Start memory monitoring thread
        monitoring = True
        
        async def memory_monitor():
            start_time = time.time()
            while monitoring:
                memory_samples.append({
                    "time": time.time() - start_time,
                    "memory_mb": self.get_memory_usage()
                })
                await asyncio.sleep(0.1)  # Sample every 100ms
        
        # Start monitoring
        monitor_task = asyncio.create_task(memory_monitor())
        
        # Make the request
        start_time = time.time()
        try:
            if provider == "openai":
                await provider_service._generate_openai_completion(
                    messages=messages,
                    model=model
                )
            else:  # ollama
                await provider_service._generate_ollama_completion(
                    messages=messages,
                    model=model
                )
        finally:
            end_time = time.time()
            
            # Stop monitoring
            monitoring = False
            await monitor_task
        
        return {
            "samples": memory_samples,
            "duration": end_time - start_time,
            "peak_memory": max(sample["memory_mb"] for sample in memory_samples) if memory_samples else 0,
            "mean_memory": sum(sample["memory_mb"] for sample in memory_samples) / len(memory_samples) if memory_samples else 0
        }
    
    @pytest.mark.asyncio
    async def test_memory_usage_comparison(self, services):
        """Compare memory usage between OpenAI and Ollama."""
        provider_service = services["provider_service"]
        
        # Test messages
        test_message = {"role": "user", "content": "Write a detailed essay about climate change and its global impact."}
        
        # Models to test
        models = {
            "openai": ["gpt-3.5-turbo"],
            "ollama": ["llama2"]
        }
        
        # Collect results
        results = []
        memory_data = {}
        
        for provider in models:
            for model in models[provider]:
                # Collect initial memory
                initial_memory = self.get_memory_usage()
                
                # Monitor during request
                memory_result = await self.monitor_memory_during_request(
                    provider_service,
                    provider,
                    model,
                    [test_message]
                )
                
                # Store results
                key = f"{provider}:{model}"
                memory_data[key] = memory_result["samples"]
                
                results.append({
                    "provider": provider,
                    "model": model,
                    "initial_memory_mb": initial_memory,
                    "peak_memory_mb": memory_result["peak_memory"],
                    "mean_memory_mb": memory_result["mean_memory"],
                    "memory_increase_mb": memory_result["peak_memory"] - initial_memory,
                    "duration_seconds": memory_result["duration"]
                })
                
                # Wait a bit to let memory stabilize
                await asyncio.sleep(2)
        
        # Analyze results
        df = pd.DataFrame(results)
        
        # Print summary
        print("\nMemory Usage Results:")
        print(df.to_string(index=False))
        
        # Create visualization
        plt.figure(figsize=(15, 10))
        
        # Plot memory over time
        plt.subplot(2, 1, 1)
        for key, samples in memory_data.items():
            times = [s["time"] for s in samples]
            memory = [s["memory_mb"] for s in samples]
            plt.plot(times, memory, label=key)
        
        plt.xlabel("Time (seconds)")
        plt.ylabel("Memory Usage (MB)")
        plt.title("Memory Usage Over Time During Request")
        plt.legend()
        plt.grid(True)
        
        # Plot peak and increase
        plt.subplot(2, 1, 2)
        providers = df["provider"].tolist()
        models = df["model"].tolist()
        labels = [f"{p}\n{m}" for p, m in zip(providers, models)]
        x = range(len(labels))
        
        plt.bar(x, df["memory_increase_mb"], label="Memory Increase")
        plt.xticks(x, labels)
        plt.ylabel("Memory (MB)")
        plt.title("Memory Increase by Provider/Model")
        plt.legend()
        plt.grid(True)
        
        plt.tight_layout()
        plt.savefig('memory_benchmark.png')
        
        # Assert something meaningful
        assert len(results) > 0, "No memory benchmark results collected"

Response Quality Benchmarking

Python
# tests/performance/test_response_quality.py
import pytest
import os
import asyncio
import json
import pandas as pd
import matplotlib.pyplot as plt
from typing import List, Dict, Any

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

# Skip tests if it's CI environment
SKIP_PERFORMANCE_TESTS = os.environ.get("CI") == "true"

@pytest.mark.skipif(SKIP_PERFORMANCE_TESTS, reason="Performance tests skipped in CI environment")
class TestResponseQuality:
    @pytest.fixture
    async def services(self):
        """Set up services for quality testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    async def get_response(self, provider_service, provider, model, messages):
        """Get a response from a specific provider and model."""
        if provider == "openai":
            response = await provider_service._generate_openai_completion(
                messages=messages,
                model=model
            )
        else:  # ollama
            response = await provider_service._generate_ollama_completion(
                messages=messages,
                model=model
            )
            
        return response["message"]["content"]
    
    async def evaluate_response(self, provider_service, response, criteria):
        """Evaluate a response using GPT-4 as a judge."""
        evaluation_prompt = [
            {"role": "system", "content": """
            You are an expert evaluator of AI responses. Evaluate the given response based on the specified criteria.
            For each criterion, provide a score from 1-10 and a brief explanation.
            Format your response as valid JSON with the following structure:
            {
                "criteria": {
                    "accuracy": {"score": X, "explanation": "..."},
                    "completeness": {"score": X, "explanation": "..."},
                    "coherence": {"score": X, "explanation": "..."},
                    "relevance": {"score": X, "explanation": "..."}
                },
                "overall_score": X,
                "summary": "..."
            }
            """},
            {"role": "user", "content": f"""
            Evaluate this AI response based on {', '.join(criteria)}:
            
            RESPONSE TO EVALUATE:
            {response}
            """}
        ]
        
        # Use GPT-4 to evaluate
        evaluation = await provider_service._generate_openai_completion(
            messages=evaluation_prompt,
            model="gpt-4",
            response_format={"type": "json_object"}
        )
        
        try:
            return json.loads(evaluation["message"]["content"])
        except:
            # Fallback if parsing fails
            return {
                "criteria": {c: {"score": 0, "explanation": "Failed to parse"} for c in criteria},
                "overall_score": 0,
                "summary": "Failed to parse evaluation"
            }
    
    @pytest.mark.asyncio
    async def test_response_quality_comparison(self, services):
        """Compare response quality between OpenAI and Ollama models."""
        provider_service = services["provider_service"]
        
        # Test scenarios
        test_scenarios = [
            {
                "name": "factual_knowledge",
                "query": "Explain the process of photosynthesis and its importance to life on Earth."
            },
            {
                "name": "reasoning",
                "query": "A bat and ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
            },
            {
                "name": "creative_writing",
                "query": "Write a short story about a robot discovering emotions."
            },
            {
                "name": "code_generation",
                "query": "Write a Python function to check if a string is a palindrome."
            }
        ]
        
        # Models to test
        models = {
            "openai": ["gpt-3.5-turbo"],
            "ollama": ["llama2", "mistral"]
        }
        
        # Evaluation criteria
        criteria = ["accuracy", "completeness", "coherence", "relevance"]
        
        # Collect results
        results = []
        
        for scenario in test_scenarios:
            for provider in models:
                for model in models[provider]:
                    try:
                        # Get response
                        response = await self.get_response(
                            provider_service,
                            provider,
                            model,
                            [{"role": "user", "content": scenario["query"]}]
                        )
                        
                        # Evaluate response
                        evaluation = await self.evaluate_response(
                            provider_service,
                            response,
                            criteria
                        )
                        
                        # Store results
                        results.append({
                            "scenario": scenario["name"],
                            "provider": provider,
                            "model": model,
                            "overall_score": evaluation["overall_score"],
                            **{f"{criterion}_score": evaluation["criteria"][criterion]["score"] 
                              for criterion in criteria}
                        })
                        
                        # Add raw responses for detailed analysis
                        with open(f"response_{provider}_{model}_{scenario['name']}.txt", "w") as f:
                            f.write(response)
                        
                        # Add a delay to avoid rate limits
                        await asyncio.sleep(2)
                    except Exception as e:
                        print(f"Error evaluating {provider}:{model} on {scenario['name']}: {str(e)}")
        
        # Analyze results
        df = pd.DataFrame(results)
        
        # Save results
        df.to_csv("quality_benchmark_results.csv", index=False)
        
        # Print summary
        print("\nResponse Quality Results:")
        summary = df.groupby(['provider', 'model']).mean().reset_index()
        print(summary.to_string(index=False))
        
        # Create visualization
        plt.figure(figsize=(15, 10))
        
        # Plot overall scores by scenario
        plt.subplot(2, 1, 1)
        for i, scenario in enumerate(test_scenarios):
            scenario_df = df[df['scenario'] == scenario['name']]
            providers = scenario_df["provider"].tolist()
            models = scenario_df["model"].tolist()
            labels = [f"{p}\n{m}" for p, m in zip(providers, models)]
            
            plt.subplot(2, 2, i+1)
            plt.bar(labels, scenario_df["overall_score"])
            plt.title(f"Quality Scores: {scenario['name']}")
            plt.ylabel("Score (1-10)")
            plt.ylim(0, 10)
            plt.xticks(rotation=45)
        
        plt.tight_layout()
        plt.savefig('quality_benchmark.png')
        
        # Assert something meaningful
        assert len(results) > 0, "No quality benchmark results collected"

4. Reliability Testing Framework

Error Handling and Fallback Testing

Python
# tests/reliability/test_error_handling.py
import pytest
import asyncio
from unittest.mock import AsyncMock, patch, MagicMock
import aiohttp

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

class TestErrorHandling:
    @pytest.fixture
    def provider_service(self):
        """Create a provider service with mocked dependencies for testing."""
        service = ProviderService()
        service.openai_client = AsyncMock()
        service.ollama_service = AsyncMock(spec=OllamaService)
        return service
    
    @pytest.mark.asyncio
    async def test_openai_connection_error(self, provider_service):
        """Test handling of OpenAI connection errors."""
        # Mock OpenAI to raise a connection error
        provider_service._generate_openai_completion = AsyncMock(
            side_effect=aiohttp.ClientConnectionError("Connection refused")
        )
        
        # Mock Ollama to succeed
        provider_service._generate_ollama_completion = AsyncMock(return_value={
            "id": "ollama-fallback",
            "provider": "ollama",
            "message": {"content": "Fallback response"}
        })
        
        # Test with auto routing
        response = await provider_service.generate_completion(
            messages=[{"role": "user", "content": "Test message"}],
            provider="auto"
        )
        
        # Verify fallback worked
        assert response["provider"] == "ollama"
        assert response["message"]["content"] == "Fallback response"
        provider_service._generate_openai_completion.assert_called_once()
        provider_service._generate_ollama_completion.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_ollama_connection_error(self, provider_service):
        """Test handling of Ollama connection errors."""
        # Mock the auto routing to select Ollama first
        provider_service._auto_route = AsyncMock(return_value=Provider.OLLAMA)
        
        # Mock Ollama to fail
        provider_service._generate_ollama_completion = AsyncMock(
            side_effect=aiohttp.ClientConnectionError("Connection refused")
        )
        
        # Mock OpenAI to succeed
        provider_service._generate_openai_completion = AsyncMock(return_value={
            "id": "openai-fallback",
            "provider": "openai",
            "message": {"content": "Fallback response"}
        })
        
        # Test with auto routing
        response = await provider_service.generate_completion(
            messages=[{"role": "user", "content": "Test message"}],
            provider="auto"
        )
        
        # Verify fallback worked
        assert response["provider"] == "openai"
        assert response["message"]["content"] == "Fallback response"
        provider_service._generate_ollama_completion.assert_called_once()
        provider_service._generate_openai_completion.assert_called_once()
    
    @pytest.mark.asyncio
    async def test_rate_limit_handling(self, provider_service):
        """Test handling of rate limit errors."""
        # Mock OpenAI to raise a rate limit error
        rate_limit_error = MagicMock()
        rate_limit_error.status_code = 429
        rate_limit_error.json.return_value = {"error": {"message": "Rate limit exceeded"}}
        
        provider_service._generate_openai_completion = AsyncMock(
            side_effect=openai.RateLimitError("Rate limit exceeded", response=rate_limit_error)
        )
        
        # Mock Ollama to succeed
        provider_service._generate_ollama_completion = AsyncMock(return_value={
            "id": "ollama-fallback",
            "provider": "ollama",
            "message": {"content": "Fallback response"}
        })
        
        # Test with auto routing
        response = await provider_service.generate_completion(
            messages=[{"role": "user", "content": "Test message"}],
            provider="auto"
        )
        
        # Verify fallback worked
        assert response["provider"] == "ollama"
        assert response["message"]["content"] == "Fallback response"
    
    @pytest.mark.asyncio
    async def test_timeout_handling(self, provider_service):
        """Test handling of timeout errors."""
        # Mock OpenAI to raise a timeout error
        provider_service._generate_openai_completion = AsyncMock(
            side_effect=asyncio.TimeoutError("Request timed out")
        )
        
        # Mock Ollama to succeed
        provider_service._generate_ollama_completion = AsyncMock(return_value={
            "id": "ollama-fallback",
            "provider": "ollama",
            "message": {"content": "Fallback response"}
        })
        
        # Test with auto routing
        response = await provider_service.generate_completion(
            messages=[{"role": "user", "content": "Test message"}],
            provider="auto"
        )
        
        # Verify fallback worked
        assert response["provider"] == "ollama"
        assert response["message"]["content"] == "Fallback response"
    
    @pytest.mark.asyncio
    async def test_all_providers_fail(self, provider_service):
        """Test case when all providers fail."""
        # Mock both providers to fail
        provider_service._generate_openai_completion = AsyncMock(
            side_effect=Exception("OpenAI failed")
        )
        
        provider_service._generate_ollama_completion = AsyncMock(
            side_effect=Exception("Ollama failed")
        )
        
        # Test with auto routing - should raise an exception
        with pytest.raises(Exception) as excinfo:
            await provider_service.generate_completion(
                messages=[{"role": "user", "content": "Test message"}],
                provider="auto"
            )
        
        # Verify the original exception is re-raised
        assert "OpenAI failed" in str(excinfo.value)
        provider_service._generate_openai_completion.assert_called_once()
        provider_service._generate_ollama_completion.assert_called_once()

Load Testing

Python
# tests/reliability/test_load.py
import pytest
import asyncio
import time
import os
import pandas as pd
import matplotlib.pyplot as plt
from aiohttp import ClientSession, TCPConnector

from app.services.provider_service import ProviderService, Provider

# Skip tests if it's CI environment
SKIP_LOAD_TESTS = os.environ.get("CI") == "true"

@pytest.mark.skipif(SKIP_LOAD_TESTS, reason="Load tests skipped in CI environment")
class TestLoadHandling:
    @pytest.fixture
    async def provider_service(self):
        """Set up provider service for load testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize service
        service = ProviderService()
        
        try:
            await service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize service: {str(e)}")
        
        yield service
        
        # Cleanup
        await service.cleanup()
    
    async def send_request(self, provider_service, provider, model, message, request_id):
        """Send a single request and record performance."""
        start_time = time.time()
        success = False
        error = None
        
        try:
            response = await provider_service.generate_completion(
                messages=[{"role": "user", "content": message}],
                provider=provider,
                model=model
            )
            success = True
        except Exception as e:
            error = str(e)
        
        end_time = time.time()
        
        return {
            "request_id": request_id,
            "provider": provider,
            "model": model,
            "success": success,
            "error": error,
            "duration": end_time - start_time
        }
    
    @pytest.mark.asyncio
    async def test_concurrent_requests(self, provider_service):
        """Test handling of multiple concurrent requests."""
        # Test configurations
        providers = ["openai", "ollama", "auto"]
        request_count = 10  # 10 requests per provider
        
        # Test message (simple to avoid rate limits)
        message = "What is 2+2?"
        
        # Create tasks for all requests
        tasks = []
        request_id = 0
        
        for provider in providers:
            for _ in range(request_count):
                # Determine model based on provider
                if provider == "openai":
                    model = "gpt-3.5-turbo"
                elif provider == "ollama":
                    model = "llama2"
                else:
                    model = None  # Auto select
                
                tasks.append(self.send_request(
                    provider_service,
                    provider,
                    model,
                    message,
                    request_id
                ))
                request_id += 1
                
                # Small delay to avoid immediate rate limiting
                await asyncio.sleep(0.1)
        
        # Run requests concurrently with a reasonable concurrency limit
        concurrency_limit = 5
        results = []
        
        for i in range(0, len(tasks), concurrency_limit):
            batch = tasks[i:i+concurrency_limit]
            batch_results = await asyncio.gather(*batch)
            results.extend(batch_results)
            
            # Delay between batches to avoid rate limits
            await asyncio.sleep(2)
        
        # Analyze results
        df = pd.DataFrame(results)
        
        # Print summary
        print("\nConcurrent Request Test Results:")
        success_rate = df.groupby('provider')['success'].mean() * 100
        mean_duration = df.groupby('provider')['duration'].mean()
        
        summary = pd.DataFrame({
            'success_rate': success_rate,
            'mean_duration': mean_duration
        }).reset_index()
        
        print(summary.to_string(index=False))
        
        # Create visualization
        plt.figure(figsize=(12, 10))
        
        # Plot success rate
        plt.subplot(2, 1, 1)
        plt.bar(summary['provider'], summary['success_rate'])
        plt.title('Success Rate by Provider')
        plt.ylabel('Success Rate (%)')
        plt.ylim(0, 100)
        
        # Plot response times
        plt.subplot(2, 1, 2)
        for provider in providers:
            provider_df = df[df['provider'] == provider]
            plt.plot(provider_df['request_id'], provider_df['duration'], marker='o', label=provider)
        
        plt.title('Response Time by Request')
        plt.xlabel('Request ID')
        plt.ylabel('Duration (seconds)')
        plt.legend()
        plt.grid(True)
        
        plt.tight_layout()
        plt.savefig('load_test_results.png')
        
        # Assert reasonable success rate
        for provider in providers:
            provider_success = df[df['provider'] == provider]['success'].mean() * 100
            assert provider_success >= 70, f"Success rate for {provider} is below 70%"

Stability Testing for Extended Sessions

Python
# tests/reliability/test_stability.py
import pytest
import asyncio
import time
import os
import random
import pandas as pd
import matplotlib.pyplot as plt
from typing import List, Dict, Any

from app.services.provider_service import ProviderService, Provider
from app.agents.base_agent import BaseAgent, AgentState
from app.agents.research_agent import ResearchAgent
from app.models.message import Message, MessageRole

# Skip tests if it's CI environment
SKIP_STABILITY_TESTS = os.environ.get("CI") == "true"

@pytest.mark.skipif(SKIP_STABILITY_TESTS, reason="Stability tests skipped in CI environment")
class TestSystemStability:
    @pytest.fixture
    async def setup(self):
        """Set up test environment with services and agents."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize service
        provider_service = ProviderService()
        
        try:
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize service: {str(e)}")
        
        # Create a test agent
        agent = ResearchAgent(
            provider_service=provider_service,
            knowledge_service=None,  # Mock would be better but we're testing stability
            system_prompt="You are a helpful research assistant."
        )
        
        yield {
            "provider_service": provider_service,
            "agent": agent
        }
        
        # Cleanup
        await provider_service.cleanup()
    
    async def run_conversation_turn(self, agent, message, turn_number):
        """Run a single conversation turn and record metrics."""
        start_time = time.time()
        success = False
        error = None
        memory_before = self.get_memory_usage()
        
        try:
            response = await agent.process_message(message, f"test_user_{turn_number}")
            success = True
        except Exception as e:
            error = str(e)
            response = None
        
        end_time = time.time()
        memory_after = self.get_memory_usage()
        
        return {
            "turn": turn_number,
            "success": success,
            "error": error,
            "duration": end_time - start_time,
            "memory_before": memory_before,
            "memory_after": memory_after,
            "memory_increase": memory_after - memory_before,
            "history_length": len(agent.state.conversation_history),
            "response_length": len(response) if response else 0
        }
    
    def get_memory_usage(self):
        """Get current memory usage in MB."""
        import psutil
        process = psutil.Process(os.getpid())
        memory_info = process.memory_info()
        return memory_info.rss / (1024 * 1024)  # Convert to MB
    
    @pytest.mark.asyncio
    async def test_extended_conversation(self, setup):
        """Test system stability over an extended conversation."""
        agent = setup["agent"]
        
        # List of test questions for the conversation
        questions = [
            "What is machine learning?",
            "Can you explain neural networks?",
            "What is the difference between supervised and unsupervised learning?",
            "How does reinforcement learning work?",
            "What are some applications of deep learning?",
            "Explain the concept of overfitting.",
            "What is transfer learning?",
            "How does backpropagation work?",
            "What are convolutional neural networks?",
            "Explain the transformer architecture.",
            "What is BERT and how does it work?",
            "What are GANs used for?",
            "Explain the concept of attention in neural networks.",
            "What is the difference between RNNs and LSTMs?",
            "How do recommendation systems work?"
        ]
        
        # Run an extended conversation
        results = []
        turn_limit = min(len(questions), 15)  # Limit to 15 turns for test duration
        
        for turn in range(turn_limit):
            # For later turns, occasionally refer to previous information
            if turn > 3 and random.random() < 0.3:
                message = f"Can you explain more about what you mentioned earlier regarding {random.choice(questions[:turn]).lower().replace('?', '')}"
            else:
                message = questions[turn]
                
            result = await self.run_conversation_turn(agent, message, turn)
            results.append(result)
            
            # Print progress
            status = "✓" if result["success"] else "✗"
            print(f"Turn {turn+1}/{turn_limit} {status} - Time: {result['duration']:.2f}s")
            
            # Delay between turns
            await asyncio.sleep(2)
        
        # Analyze results
        df = pd.DataFrame(results)
        
        # Print summary statistics
        print("\nExtended Conversation Test Results:")
        print(f"Success rate: {df['success'].mean()*100:.1f}%")
        print(f"Average response time: {df['duration'].mean():.2f}s")
        print(f"Final conversation history length: {df['history_length'].iloc[-1]}")
        print(f"Memory usage increase: {df['memory_after'].iloc[-1] - df['memory_before'].iloc[0]:.2f} MB")
        
        # Create visualization
        plt.figure(figsize=(15, 12))
        
        # Plot response times
        plt.subplot(3, 1, 1)
        plt.plot(df['turn'], df['duration'], marker='o')
        plt.title('Response Time by Conversation Turn')
        plt.xlabel('Turn')
        plt.ylabel('Duration (seconds)')
        plt.grid(True)
        
        # Plot memory usage
        plt.subplot(3, 1, 2)
        plt.plot(df['turn'], df['memory_after'], marker='o')
        plt.title('Memory Usage Over Conversation')
        plt.xlabel('Turn')
        plt.ylabel('Memory (MB)')
        plt.grid(True)
        
        # Plot history length and response length
        plt.subplot(3, 1, 3)
        plt.plot(df['turn'], df['history_length'], marker='o', label='History Length')
        plt.plot(df['turn'], df['response_length'], marker='x', label='Response Length')
        plt.title('Conversation Metrics')
        plt.xlabel('Turn')
        plt.ylabel('Length (chars/items)')
        plt.legend()
        plt.grid(True)
        
        plt.tight_layout()
        plt.savefig('stability_test_results.png')
        
        # Assert reasonable success rate
        assert df['success'].mean() >= 0.8, "Success rate below 80%"
        
        # Check for memory leaks (large, consistent growth would be concerning)
        memory_growth_rate = (df['memory_after'].iloc[-1] - df['memory_before'].iloc[0]) / turn_limit
        assert memory_growth_rate < 50, f"Excessive memory growth rate: {memory_growth_rate:.2f} MB/turn"

Automation Framework

Test Orchestration Script

Python
# scripts/run_tests.py
#!/usr/bin/env python
import argparse
import os
import sys
import subprocess
import time
from datetime import datetime

def parse_args():
    parser = argparse.ArgumentParser(description='Run test suite for OpenAI-Ollama integration')
    parser.add_argument('--unit', action='store_true', help='Run unit tests')
    parser.add_argument('--integration', action='store_true', help='Run integration tests')
    parser.add_argument('--performance', action='store_true', help='Run performance tests')
    parser.add_argument('--reliability', action='store_true', help='Run reliability tests')
    parser.add_argument('--all', action='store_true', help='Run all tests')
    parser.add_argument('--html', action='store_true', help='Generate HTML report')
    parser.add_argument('--output-dir', default='test_results', help='Directory for test results')
    
    args = parser.parse_args()
    
    # If no specific test type is selected, run all
    if not (args.unit or args.integration or args.performance or args.reliability or args.all):
        args.all = True
        
    return args

def run_test_suite(test_type, output_dir, html=False):
    """Run a specific test suite and return success status."""
    print(f"\n{'='*80}\nRunning {test_type} tests\n{'='*80}")
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    report_file = f"{output_dir}/{test_type}_report_{timestamp}"
    
    # Create command with appropriate flags
    cmd = ["pytest", f"tests/{test_type}", "-v"]
    
    if html:
        cmd.extend(["--html", f"{report_file}.html", "--self-contained-html"])
    
    # Add JUnit XML report for CI integration
    cmd.extend(["--junitxml", f"{report_file}.xml"])
    
    # Run the tests
    start_time = time.time()
    result = subprocess.run(cmd)
    duration = time.time() - start_time
    
    # Print summary
    status = "PASSED" if result.returncode == 0 else "FAILED"
    print(f"\n{test_type} tests {status} in {duration:.2f} seconds")
    
    if html:
        print(f"HTML report saved to {report_file}.html")
    
    print(f"XML report saved to {report_file}.xml")
    
    return result.returncode == 0

def main():
    args = parse_args()
    
    # Create output directory if it doesn't exist
    os.makedirs(args.output_dir, exist_ok=True)
    
    # Track overall success
    all_passed = True
    
    # Run selected test suites
    if args.all or args.unit:
        unit_passed = run_test_suite("unit", args.output_dir, args.html)
        all_passed = all_passed and unit_passed
    
    if args.all or args.integration:
        integration_passed = run_test_suite("integration", args.output_dir, args.html)
        all_passed = all_passed and integration_passed
    
    if args.all or args.performance:
        performance_passed = run_test_suite("performance", args.output_dir, args.html)
        # Performance tests might be informational, so don't fail the build
    
    if args.all or args.reliability:
        reliability_passed = run_test_suite("reliability", args.output_dir, args.html)
        all_passed = all_passed and reliability_passed
    
    # Print overall summary
    print(f"\n{'='*80}")
    print(f"Test Suite {'PASSED' if all_passed else 'FAILED'}")
    print(f"{'='*80}")
    
    # Return appropriate exit code
    return 0 if all_passed else 1

if __name__ == "__main__":
    sys.exit(main())

CI/CD Configuration

YAML
# .github/workflows/test.yml
name: Test Suite

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main, develop ]
  workflow_dispatch:
    inputs:
      test_type:
        description: 'Test suite to run (unit, integration, all)'
        required: true
        default: 'unit'

jobs:
  test:
    runs-on: ubuntu-latest
    
    services:
      ollama:
        image: ollama/ollama:latest
        ports:
          - 11434:11434
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install -r requirements-dev.txt
    
    - name: Pull Ollama models
      run: |
        # Wait for Ollama service to be ready
        timeout 60 bash -c 'until curl -s -f http://localhost:11434/api/tags > /dev/null; do sleep 1; done'
        # Pull basic model for testing
        curl -X POST http://localhost:11434/api/pull -d '{"name":"llama2:7b-chat-q4_0"}'
      
    - name: Run unit tests
      if: ${{ github.event.inputs.test_type == 'unit' || github.event.inputs.test_type == 'all' || github.event.inputs.test_type == '' }}
      env:
        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        OLLAMA_HOST: http://localhost:11434
      run: pytest tests/unit -v --junitxml=unit-test-results.xml
    
    - name: Run integration tests
      if: ${{ github.event.inputs.test_type == 'integration' || github.event.inputs.test_type == 'all' }}
      env:
        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        OLLAMA_HOST: http://localhost:11434
      run: pytest tests/integration -v --junitxml=integration-test-results.xml
    
    - name: Upload test results
      if: always()
      uses: actions/upload-artifact@v3
      with:
        name: test-results
        path: '*-test-results.xml'
        
    - name: Publish Test Report
      uses: mikepenz/action-junit-report@v3
      if: always()
      with:
        report_paths: '*-test-results.xml'
        fail_on_failure: true

Comparative Benchmark Framework

Response Quality Evaluation Matrix

Python
# tests/benchmarks/quality_matrix.py
import pytest
import asyncio
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from typing import List, Dict, Any

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

# Test questions across multiple domains
BENCHMARK_QUESTIONS = {
    "factual_knowledge": [
        "What are the main causes of climate change?",
        "Explain how vaccines work in the human body.",
        "What were the key causes of World War I?",
        "Describe the process of photosynthesis.",
        "What is the difference between DNA and RNA?"
    ],
    "reasoning": [
        "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?",
        "A bat and ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?",
        "In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?",
        "If three people can paint three fences in three hours, how many people would be needed to paint six fences in six hours?",
        "Imagine a rope that goes around the Earth at the equator, lying flat on the ground. If you add 10 meters to the length of this rope and space it evenly above the ground, how high above the ground would the rope be?"
    ],
    "creative_writing": [
        "Write a short story about a robot discovering emotions.",
        "Create a poem about the changing seasons.",
        "Write a creative dialogue between the ocean and the moon.",
        "Describe a world where humans can photosynthesize like plants.",
        "Create a character sketch of a time-traveling historian."
    ],
    "code_generation": [
        "Write a Python function to check if a string is a palindrome.",
        "Create a JavaScript function that finds the most frequent element in an array.",
        "Write a SQL query to find the top 5 customers by purchase amount.",
        "Implement a binary search algorithm in the language of your choice.",
        "Write a function to detect a cycle in a linked list."
    ],
    "instruction_following": [
        "List 5 fruits, then number them in the reverse order, then highlight the one that starts with 'a' if any.",
        "Explain quantum computing in 3 paragraphs, then summarize each paragraph in one sentence, then create a single slogan based on these summaries.",
        "Create a table comparing 3 car models based on price, fuel efficiency, and safety. Then add a row showing which model is best in each category.",
        "Write a recipe for chocolate cake, then modify it to be vegan, then list only the ingredients that changed.",
        "Translate 'Hello, how are you?' to French, Spanish, and German, then identify which language uses the most words."
    ]
}

class TestQualityMatrix:
    @pytest.fixture
    async def services(self):
        """Set up services for benchmark testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    async def generate_response(self, provider_service, provider, model, question):
        """Generate a response from a specific provider and model."""
        try:
            if provider == "openai":
                response = await provider_service._generate_openai_completion(
                    messages=[{"role": "user", "content": question}],
                    model=model,
                    temperature=0.7
                )
            else:  # ollama
                response = await provider_service._generate_ollama_completion(
                    messages=[{"role": "user", "content": question}],
                    model=model,
                    temperature=0.7
                )
                
            return {
                "success": True,
                "content": response["message"]["content"],
                "metadata": {
                    "model": model,
                    "provider": provider
                }
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "metadata": {
                    "model": model,
                    "provider": provider
                }
            }
    
    async def evaluate_response(self, provider_service, question, response, category):
        """Evaluate a response using GPT-4 as a judge."""
        # Skip evaluation if response generation failed
        if not response.get("success", False):
            return {
                "scores": {
                    "correctness": 0,
                    "completeness": 0,
                    "coherence": 0,
                    "conciseness": 0,
                    "overall": 0
                },
                "explanation": f"Failed to generate response: {response.get('error', 'Unknown error')}"
            }
        
        evaluation_criteria = {
            "factual_knowledge": ["correctness", "completeness", "coherence", "citation"],
            "reasoning": ["logical_flow", "correctness", "explanation_quality", "step_by_step"],
            "creative_writing": ["originality", "coherence", "engagement", "language_use"],
            "code_generation": ["correctness", "efficiency", "readability", "explanation"],
            "instruction_following": ["accuracy", "completeness", "precision", "structure"]
        }
        
        # Get the appropriate criteria for this category
        criteria = evaluation_criteria.get(category, ["correctness", "completeness", "coherence", "overall"])
        
        evaluation_prompt = [
            {"role": "system", "content": f"""
            You are an expert evaluator of AI responses. Evaluate the given response to the question based on the following criteria:
            
            {', '.join(criteria)}
            
            For each criterion, provide a score from 1-10 and a brief explanation.
            Also provide an overall score from 1-10.
            
            Format your response as valid JSON with the following structure:
            {{
                "scores": {{
                    "{criteria[0]}": X,
                    "{criteria[1]}": X,
                    "{criteria[2]}": X,
                    "{criteria[3]}": X,
                    "overall": X
                }},
                "explanation": "Your overall assessment and suggestions for improvement"
            }}
            """},
            {"role": "user", "content": f"""
            Question: {question}
            
            Response to evaluate:
            {response["content"]}
            """}
        ]
        
        # Use GPT-4 to evaluate
        evaluation = await provider_service._generate_openai_completion(
            messages=evaluation_prompt,
            model="gpt-4",
            response_format={"type": "json_object"}
        )
        
        try:
            return json.loads(evaluation["message"]["content"])
        except:
            # Fallback if parsing fails
            return {
                "scores": {criterion: 0 for criterion in criteria + ["overall"]},
                "explanation": "Failed to parse evaluation"
            }
    
    @pytest.mark.asyncio
    async def test_quality_matrix(self, services):
        """Generate a comprehensive quality comparison matrix."""
        provider_service = services["provider_service"]
        
        # Models to test
        models = {
            "openai": ["gpt-3.5-turbo", "gpt-4-turbo"],
            "ollama": ["llama2", "mistral", "codellama"]
        }
        
        # Select a subset of questions for each category to keep test duration reasonable
        test_questions = {}
        for category, questions in BENCHMARK_QUESTIONS.items():
            # Take up to 3 questions per category
            test_questions[category] = questions[:2]
        
        # Collect results
        all_results = []
        
        for category, questions in test_questions.items():
            for question in questions:
                for provider in models:
                    for model in models[provider]:
                        print(f"Testing {provider}:{model} on {category} question")
                        
                        # Generate response
                        response = await self.generate_response(
                            provider_service,
                            provider,
                            model,
                            question
                        )
                        
                        # Save raw response
                        model_safe_name = model.replace(":", "_")
                        os.makedirs("benchmark_responses", exist_ok=True)
                        with open(f"benchmark_responses/{provider}_{model_safe_name}_{category}.txt", "a") as f:
                            f.write(f"\nQuestion: {question}\n\n")
                            f.write(f"Response: {response.get('content', 'ERROR: ' + response.get('error', 'Unknown error'))}\n")
                            f.write("-" * 80 + "\n")
                        
                        # If successful, evaluate the response
                        if response.get("success", False):
                            evaluation = await self.evaluate_response(
                                provider_service,
                                question,
                                response,
                                category
                            )
                            
                            # Add to results
                            result = {
                                "category": category,
                                "question": question,
                                "provider": provider,
                                "model": model,
                                "success": response["success"]
                            }
                            
                            # Add scores
                            for criterion, score in evaluation["scores"].items():
                                result[f"score_{criterion}"] = score
                                
                            all_results.append(result)
                        else:
                            # Add failed result
                            all_results.append({
                                "category": category,
                                "question": question,
                                "provider": provider,
                                "model": model,
                                "success": False,
                                "score_overall": 0
                            })
                        
                        # Add a delay to avoid rate limits
                        await asyncio.sleep(2)
        
        # Analyze results
        df = pd.DataFrame(all_results)
        
        # Save full results
        df.to_csv("benchmark_quality_matrix.csv", index=False)
        
        # Create summary by model and category
        summary = df.groupby(["provider", "model", "category"])["score_overall"].mean().reset_index()
        pivot_summary = summary.pivot_table(
            index=["provider", "model"],
            columns="category",
            values="score_overall"
        ).round(2)
        
        # Add average across categories
        pivot_summary["average"] = pivot_summary.mean(axis=1)
        
        # Save summary
        pivot_summary.to_csv("benchmark_quality_summary.csv")
        
        # Create visualization
        plt.figure(figsize=(15, 10))
        
        # Heatmap of scores
        plt.subplot(1, 1, 1)
        sns.heatmap(pivot_summary, annot=True, cmap="YlGnBu", vmin=1, vmax=10)
        plt.title("Model Performance by Category (Average Score 1-10)")
        
        plt.tight_layout()
        plt.savefig('benchmark_quality_matrix.png')
        
        # Print summary to console
        print("\nQuality Benchmark Results:")
        print(pivot_summary.to_string())
        
        # Assert something meaningful
        assert len(all_results) > 0, "No benchmark results collected"

Latency and Cost Efficiency Analysis

Python
# tests/benchmarks/efficiency_analysis.py
import pytest
import asyncio
import time
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Dict, Any

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

# Test prompts of different lengths
BENCHMARK_PROMPTS = {
    "short": "What is artificial intelligence?",
    "medium": "Explain the differences between supervised, unsupervised, and reinforcement learning in machine learning.",
    "long": "Write a comprehensive essay on the ethical implications of artificial intelligence in healthcare, considering patient privacy, diagnostic accuracy, and accessibility issues.",
    "very_long": """
    Analyze the historical development of artificial intelligence from its conceptual origins to the present day.
    Include key milestones, technological breakthroughs, paradigm shifts in approaches, and influential researchers.
    Also discuss how AI has been portrayed in popular culture and how that has influenced public perception and research funding.
    Finally, provide a thoughtful discussion on where AI might be headed in the next 20 years and what ethical frameworks
    should be considered as we continue to advance the technology.
    """
}

class TestEfficiencyAnalysis:
    @pytest.fixture
    async def services(self):
        """Set up services for benchmark testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    async def measure_response_metrics(self, provider_service, provider, model, prompt, max_tokens=None):
        """Measure response time, token counts, and other metrics."""
        start_time = time.time()
        success = False
        error = None
        token_count = {"prompt": 0, "completion": 0, "total": 0}
        
        try:
            if provider == "openai":
                response = await provider_service._generate_openai_completion(
                    messages=[{"role": "user", "content": prompt}],
                    model=model,
                    max_tokens=max_tokens
                )
            else:  # ollama
                response = await provider_service._generate_ollama_completion(
                    messages=[{"role": "user", "content": prompt}],
                    model=model,
                    max_tokens=max_tokens
                )
                
            success = True
            
            # Extract token counts from usage if available
            if "usage" in response:
                token_count = {
                    "prompt": response["usage"].get("prompt_tokens", 0),
                    "completion": response["usage"].get("completion_tokens", 0),
                    "total": response["usage"].get("total_tokens", 0)
                }
            
            response_text = response["message"]["content"]
            
        except Exception as e:
            error = str(e)
            response_text = None
        
        end_time = time.time()
        duration = end_time - start_time
        
        # Estimate cost (for OpenAI)
        cost = 0.0
        if provider == "openai" and success:
            if "gpt-4" in model:
                # GPT-4 pricing (approximate)
                cost = token_count["prompt"] * 0.00003 + token_count["completion"] * 0.00006
            else:
                # GPT-3.5 pricing (approximate)
                cost = token_count["prompt"] * 0.0000015 + token_count["completion"] * 0.000002
        
        return {
            "success": success,
            "error": error,
            "duration": duration,
            "token_count": token_count,
            "response_length": len(response_text) if response_text else 0,
            "cost": cost,
            "tokens_per_second": token_count["completion"] / duration if success and duration > 0 else 0
        }
    
    @pytest.mark.asyncio
    async def test_efficiency_benchmark(self, services):
        """Perform comprehensive efficiency analysis."""
        provider_service = services["provider_service"]
        
        # Models to test
        models = {
            "openai": ["gpt-3.5-turbo", "gpt-4"],
            "ollama": ["llama2", "mistral:7b", "llama2:13b"]
        }
        
        # Number of repetitions for each test
        repetitions = 2
        
        # Results
        results = []
        
        for prompt_length, prompt in BENCHMARK_PROMPTS.items():
            for provider in models:
                for model in models[provider]:
                    print(f"Testing {provider}:{model} with {prompt_length} prompt")
                    
                    for rep in range(repetitions):
                        try:
                            metrics = await self.measure_response_metrics(
                                provider_service,
                                provider,
                                model,
                                prompt
                            )
                            
                            results.append({
                                "provider": provider,
                                "model": model,
                                "prompt_length": prompt_length,
                                "repetition": rep + 1,
                                **metrics
                            })
                            
                            # Add a delay to avoid rate limits
                            await asyncio.sleep(2)
                        except Exception as e:
                            print(f"Error in benchmark: {str(e)}")
        
        # Create DataFrame
        df = pd.DataFrame(results)
        
        # Save raw results
        df.to_csv("benchmark_efficiency_raw.csv", index=False)
        
        # Create summary by model and prompt length
        latency_summary = df.groupby(["provider", "model", "prompt_length"])["duration"].mean().reset_index()
        latency_pivot = latency_summary.pivot_table(
            index=["provider", "model"],
            columns="prompt_length",
            values="duration"
        ).round(2)
        
        # Calculate efficiency metrics (tokens per second and cost per 1000 tokens)
        efficiency_df = df[df["success"]].copy()
        efficiency_df["cost_per_1k_tokens"] = efficiency_df.apply(
            lambda row: (row["cost"] * 1000 / row["token_count"]["total"]) 
            if row["provider"] == "openai" and row["token_count"]["total"] > 0 
            else 0, 
            axis=1
        )
        
        efficiency_summary = efficiency_df.groupby(["provider", "model"])[
            ["tokens_per_second", "cost_per_1k_tokens"]
        ].mean().round(3)
        
        # Save summaries
        latency_pivot.to_csv("benchmark_latency_summary.csv")
        efficiency_summary.to_csv("benchmark_efficiency_summary.csv")
        
        # Create visualizations
        plt.figure(figsize=(15, 10))
        
        # Latency by prompt length and model
        plt.subplot(2, 1, 1)
        ax = plt.gca()
        latency_pivot.plot(kind='bar', ax=ax)
        plt.title("Response Time by Prompt Length")
        plt.ylabel("Time (seconds)")
        plt.xticks(rotation=45)
        plt.legend(title="Prompt Length")
        
        # Tokens per second by model
        plt.subplot(2, 2, 3)
        efficiency_summary["tokens_per_second"].plot(kind='bar')
        plt.title("Generation Speed (Tokens/Second)")
        plt.ylabel("Tokens per Second")
        plt.xticks(rotation=45)
        
        # Cost per 1000 tokens (OpenAI only)
        plt.subplot(2, 2, 4)
        openai_efficiency = efficiency_summary.loc["openai"]
        openai_efficiency["cost_per_1k_tokens"].plot(kind='bar')
        plt.title("Cost per 1000 Tokens (OpenAI)")
        plt.ylabel("Cost ($)")
        plt.xticks(rotation=45)
        
        plt.tight_layout()
        plt.savefig('benchmark_efficiency.png')
        
        # Print summary to console
        print("\nLatency by Prompt Length (seconds):")
        print(latency_pivot.to_string())
        
        print("\nEfficiency Metrics:")
        print(efficiency_summary.to_string())
        
        # Comparison analysis
        if "ollama" in df["provider"].values and "openai" in df["provider"].values:
            # Calculate average speedup/slowdown ratio
            openai_avg = df[df["provider"] == "openai"]["duration"].mean()
            ollama_avg = df[df["provider"] == "ollama"]["duration"].mean()
            
            speedup = openai_avg / ollama_avg if ollama_avg > 0 else float('inf')
            
            print(f"\nAverage time ratio (OpenAI/Ollama): {speedup:.2f}")
            if speedup > 1:
                print(f"Ollama is {speedup:.2f}x faster on average")
            else:
                print(f"OpenAI is {1/speedup:.2f}x faster on average")
        
        # Assert something meaningful
        assert len(results) > 0, "No benchmark results collected"

Tool Usage Comparison

Python
# tests/benchmarks/tool_usage_comparison.py
import pytest
import asyncio
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from typing import List, Dict, Any

from app.services.provider_service import ProviderService, Provider
from app.services.ollama_service import OllamaService

# Test tools for benchmarking
BENCHMARK_TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_hotels",
            "description": "Search for hotels in a specific location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city to search in"
                    },
                    "check_in": {
                        "type": "string",
                        "description": "Check-in date in YYYY-MM-DD format"
                    },
                    "check_out": {
                        "type": "string",
                        "description": "Check-out date in YYYY-MM-DD format"
                    },
                    "guests": {
                        "type": "integer",
                        "description": "Number of guests"
                    },
                    "price_range": {
                        "type": "string",
                        "description": "Price range, e.g. '$0-$100'"
                    }
                },
                "required": ["location", "check_in", "check_out"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate_mortgage",
            "description": "Calculate monthly mortgage payment",
            "parameters": {
                "type": "object",
                "properties": {
                    "loan_amount": {
                        "type": "number",
                        "description": "The loan amount in dollars"
                    },
                    "interest_rate": {
                        "type": "number",
                        "description": "Annual interest rate (percentage)"
                    },
                    "loan_term": {
                        "type": "integer",
                        "description": "Loan term in years"
                    },
                    "down_payment": {
                        "type": "number",
                        "description": "Down payment amount in dollars"
                    }
                },
                "required": ["loan_amount", "interest_rate", "loan_term"]
            }
        }
    }
]

# Tool usage queries
TOOL_QUERIES = [
    "What's the weather like in Miami right now?",
    "Find me hotels in New York for next weekend for 2 people.",
    "Calculate the monthly payment for a $300,000 mortgage with 4.5% interest over 30 years.",
    "What's the weather in Tokyo and Paris this week?",
    "I need to calculate mortgage payments for different interest rates: 3%, 4%, and 5% on a $250,000 loan."
]

class TestToolUsageComparison:
    @pytest.fixture
    async def services(self):
        """Set up services for benchmark testing."""
        if not os.environ.get("OPENAI_API_KEY"):
            pytest.skip("OPENAI_API_KEY environment variable not set")
            
        # Initialize services
        ollama_service = OllamaService()
        provider_service = ProviderService()
        
        try:
            await ollama_service.initialize()
            await provider_service.initialize()
        except Exception as e:
            pytest.skip(f"Failed to initialize services: {str(e)}")
        
        yield {
            "ollama_service": ollama_service,
            "provider_service": provider_service
        }
        
        # Cleanup
        await ollama_service.cleanup()
        await provider_service.cleanup()
    
    async def generate_with_tools(self, provider_service, provider, model, query, tools):
        """Generate a response with tools and measure performance."""
        start_time = time.time()
        success = False
        error = None
        
        try:
            if provider == "openai":
                response = await provider_service._generate_openai_completion(
                    messages=[{"role": "user", "content": query}],
                    model=model,
                    tools=tools
                )
            else:  # ollama
                response = await provider_service._generate_ollama_completion(
                    messages=[{"role": "user", "content": query}],
                    model=model,
                    tools=tools
                )
                
            success = True
            tool_calls = response.get("tool_calls", [])
            message_content = response["message"]["content"]
            
            # Determine if tools were used correctly
            tools_used = len(tool_calls) > 0
            
            # For Ollama (which might not have native tool support), check for tool-like patterns
            if not tools_used and provider == "ollama":
                # Check if response contains structured tool usage
                if "" in message_content:
                    tools_used = True
                    
                # Look for patterns matching function names
                for tool in tools:
                    if f"{tool['function']['name']}" in message_content:
                        tools_used = True
                        break
            
        except Exception as e:
            error = str(e)
            message_content = None
            tools_used = False
            tool_calls = []
        
        end_time = time.time()
        
        return {
            "success": success,
            "error": error,
            "duration": end_time - start_time,
            "message": message_content,
            "tools_used": tools_used,
            "tool_call_count": len(tool_calls),
            "tool_calls": tool_calls
        }
    
    @pytest.mark.asyncio
    async def test_tool_usage_benchmark(self, services):
        """Benchmark tool usage across providers and models."""
        provider_service = services["provider_service"]
        
        # Models to test
        models = {
            "openai": ["gpt-3.5-turbo", "gpt-4-turbo"],
            "ollama": ["llama2", "mistral"]
        }
        
        # Results
        results = []
        
        for query in TOOL_QUERIES:
            for provider in models:
                for model in models[provider]:
                    print(f"Testing {provider}:{model} with tools query: {query[:30]}...")
                    
                    try:
                        metrics = await self.generate_with_tools(
                            provider_service,
                            provider,
                            model,
                            query,
                            BENCHMARK_TOOLS
                        )
                        
                        results.append({
                            "provider": provider,
                            "model": model,
                            "query": query,
                            **metrics
                        })
                        
                        # Save raw response
                        model_safe_name = model.replace(":", "_")
                        os.makedirs("tool_benchmark_responses", exist_ok=True)
                        with open(f"tool_benchmark_responses/{provider}_{model_safe_name}.txt", "a") as f:
                            f.write(f"\nQuery: {query}\n\n")
                            f.write(f"Response: {metrics.get('message', 'ERROR: ' + metrics.get('error', 'Unknown error'))}\n")
                            if metrics.get('tool_calls'):
                                f.write("\nTool Calls:\n")
                                f.write(json.dumps(metrics['tool_calls'], indent=2))
                            f.write("\n" + "-" * 80 + "\n")
                        
                        # Add a delay to avoid rate limits
                        await asyncio.sleep(2)
                    except Exception as e:
                        print(f"Error in benchmark: {str(e)}")
        
        # Create DataFrame
        df = pd.DataFrame(results)
        
        # Save raw results
        df.to_csv("benchmark_tool_usage_raw.csv", index=False)
        
        # Create summary
        tool_usage_summary = df.groupby(["provider", "model"])[
            ["success", "tools_used", "tool_call_count", "duration"]
        ].agg({
            "success": "mean", 
            "tools_used": "mean", 
            "tool_call_count": "mean",
            "duration": "mean"
        }).round(3)
        
        # Rename columns for clarity
        tool_usage_summary.columns = [
            "Success Rate", 
            "Tool Usage Rate", 
            "Avg Tool Calls",
            "Avg Duration (s)"
        ]
        
        # Save summary
        tool_usage_summary.to_csv("benchmark_tool_usage_summary.csv")
        
        # Create visualizations
        plt.figure(figsize=(15, 10))
        
        # Tool usage rate by model
        plt.subplot(2, 2, 1)
        tool_usage_summary["Tool Usage Rate"].plot(kind='bar')
        plt.title("Tool Usage Rate by Model")
        plt.ylabel("Rate (0-1)")
        plt.ylim(0, 1)
        plt.xticks(rotation=45)
        
        # Average tool calls by model
        plt.subplot(2, 2, 2)
        tool_usage_summary["Avg Tool Calls"].plot(kind='bar')
        plt.title("Average Tool Calls per Query")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        
        # Success rate by model
        plt.subplot(2, 2, 3)
        tool_usage_summary["Success Rate"].plot(kind='bar')
        plt.title("Success Rate")
        plt.ylabel("Rate (0-1)")
        plt.ylim(0, 1)
        plt.xticks(rotation=45)
        
        # Average duration by model
        plt.subplot(2, 2, 4)
        tool_usage_summary["Avg Duration (s)"].plot(kind='bar')
        plt.title("Average Response Time")
        plt.ylabel("Seconds")
        plt.xticks(rotation=45)
        
        plt.tight_layout()
        plt.savefig('benchmark_tool_usage.png')
        
        # Print summary to console
        print("\nTool Usage Benchmark Results:")
        print(tool_usage_summary.to_string())
        
        # Qualitative analysis - extract patterns in tool usage
        if len(df[df["tools_used"]]) > 0:
            print("\nQualitative Analysis of Tool Usage:")
            
            # Comparison between providers
            openai_correct = df[(df["provider"] == "openai") & (df["tools_used"])].shape[0]
            openai_total = df[df["provider"] == "openai"].shape[0]
            openai_rate = openai_correct / openai_total if openai_total > 0 else 0
            
            ollama_correct = df[(df["provider"] == "ollama") & (df["tools_used"])].shape[0]
            ollama_total = df[df["provider"] == "ollama"].shape[0]
            ollama_rate = ollama_correct / ollama_total if ollama_total > 0 else 0
            
            print(f"OpenAI tool usage rate: {openai_rate:.2f}")
            print(f"Ollama tool usage rate: {ollama_rate:.2f}")
            
            if openai_rate > 0 and ollama_rate > 0:
                ratio = openai_rate / ollama_rate
                print(f"OpenAI is {ratio:.2f}x more likely to use tools correctly")
            
            # Additional insights
            if "openai" in df["provider"].values and "ollama" in df["provider"].values:
                openai_time = df[df["provider"] == "openai"]["duration"].mean()
                ollama_time = df[df["provider"] == "ollama"]["duration"].mean()
                
                if openai_time > 0 and ollama_time > 0:
                    time_ratio = openai_time / ollama_time
                    print(f"Time ratio (OpenAI/Ollama): {time_ratio:.2f}")
                    if time_ratio > 1:
                        print(f"Ollama is {time_ratio:.2f}x faster for tool-related queries")
                    else:
                        print(f"OpenAI is {1/time_ratio:.2f}x faster for tool-related queries")
        
        # Assert something meaningful
        assert len(results) > 0, "No benchmark results collected"

Pytest Configuration

Python
# pytest.ini
[pytest]
markers =
    unit: marks tests as unit tests
    integration: marks tests as integration tests
    performance: marks tests as performance tests
    reliability: marks tests as reliability tests
    benchmark: marks tests as benchmarks

testpaths = tests

python_files = test_*.py
python_classes = Test*
python_functions = test_*

# Don't run performance tests by default
addopts = -m "not performance and not reliability and not benchmark"

# Configure test outputs
junit_family = xunit2

# Add environment variables for default runs
env =
    PYTHONPATH=.
    OPENAI_MODEL=gpt-3.5-turbo
    OLLAMA_MODEL=llama2
    OLLAMA_HOST=http://localhost:11434

Test Documentation

Markdown
# Testing Strategy for OpenAI-Ollama Integration

This document outlines the comprehensive testing approach for the hybrid AI system that integrates OpenAI and Ollama.

## 1. Unit Testing

Unit tests verify the functionality of individual components in isolation:

- **Provider Service**: Tests for provider selection logic, auto-routing, and fallback mechanisms
- **Ollama Service**: Tests for response formatting, tool extraction, and error handling
- **Model Selection**: Tests for use case detection and model recommendation logic
- **Tool Integration**: Tests for proper handling of tool calls and responses

Run unit tests with:
```bash
python -m pytest tests/unit -v

2. Integration Testing

Integration tests verify the interaction between components:

API Endpoints: Tests for proper request handling, authentication, and response formatting
End-to-End Agent Flows: Tests for agent behavior across different scenarios
Cross-Provider Integration: Tests for seamless integration between OpenAI and Ollama

Run integration tests with:

Bash
python -m pytest tests/integration -v

3. Performance Testing

Performance tests measure system performance characteristics:

Response Latency: Compares response times across providers and models
Memory Usage: Measures memory consumption during request processing
Response Quality: Evaluates the quality of responses using GPT-4 as a judge

Run performance tests with:

Bash
python -m pytest tests/performance -v

4. Reliability Testing

Reliability tests verify the system's behavior under various conditions:

Error Handling: Tests for proper error detection and fallback mechanisms
Load Testing: Measures system performance under concurrent requests
Stability Testing: Evaluates system behavior during extended conversations

Run reliability tests with:

Bash
python -m pytest tests/reliability -v

5. Benchmark Framework

Comprehensive benchmarks for comparative analysis:

Quality Matrix: Compares response quality across providers and models
Efficiency Analysis: Measures performance/cost characteristics
Tool Usage Comparison: Evaluates tool handling capabilities

Run benchmarks with:

Bash
python -m pytest tests/benchmarks -v

Running the Complete Test Suite

Use the test orchestration script to run all test suites:

Bash
python scripts/run_tests.py --all

CI/CD Integration

The test suite is integrated with GitHub Actions workflow:

Bash
# Triggered on push to main/develop or manually via workflow_dispatch
git push origin main  # Automatically runs tests

Prerequisites

OpenAI API Key in environment variables:

export OPENAI_API_KEY=sk-...

Running Ollama instance:

Bash
ollama serve

Required models for Ollama:

Bash
ollama pull llama2
ollama pull mistral


## Conclusion

This comprehensive testing strategy provides a robust framework for validating the hybrid AI architecture that integrates OpenAI's cloud capabilities with Ollama's local model inference. By implementing this multi-faceted testing approach, we ensure:

1. **Functional Correctness**: Unit and integration tests verify that all components function as expected both individually and when integrated.

2. **Performance Optimization**: Benchmarks and performance tests provide quantitative data to guide resource allocation and routing decisions.

3. **Reliability**: Load and stability tests ensure the system remains responsive and produces consistent results under various conditions.

4. **Quality Assurance**: Response quality evaluations ensure that the system maintains high standards regardless of which provider handles the inference.

The test suite is designed to be extensible, allowing for additional test cases as the system evolves. By automating this testing strategy through CI/CD pipelines, we maintain ongoing quality assurance and enable continuous improvement of the hybrid AI architecture.

# User Interface Design for Hybrid OpenAI-Ollama MCP System

## Conceptual Framework for Interface Design

The Modern Computational Paradigm (MCP) system—integrating cloud-based intelligence with local inference capabilities—requires a thoughtfully designed interface that balances simplicity with advanced functionality. This document presents a comprehensive design approach for both command-line and web interfaces that expose the system's capabilities while maintaining an intuitive user experience.

## Command Line Interface (CLI) Design

### CLI Architecture

┌─────────────────────────────────────────────────────────────┐ │ │ │ MCP-CLI │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │ │ Core Module │ │ Config │ │ Interactive Mode │ │ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │ │ Agent API │ │ Model │ │ Session │ │ │ │ Client │ │ Management │ │ Management │ │ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │ │ │ │ │ │ └───────────────┼───────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Output │ │ │ │ Formatting │ │ │ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘


### CLI Wireframes

#### Main Help Screen

┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ MCP CLI v1.0.0 │ │ │ │ USAGE: │ │ mcp [OPTIONS] COMMAND [ARGS]... │ │ │ │ OPTIONS: │ │ --config PATH Path to config file │ │ --verbose Enable verbose output │ │ --help Show this message and exit │ │ │ │ COMMANDS: │ │ chat Start a chat session │ │ complete Get a completion for a prompt │ │ models List and manage available models │ │ config Configure MCP settings │ │ agents Manage agent profiles │ │ session Manage saved sessions │ │ │ └─────────────────────────────────────────────────────────────────────────┘


#### Interactive Chat Mode

┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ MCP Chat Session - ID: chat_78f3d2 │ │ Model: auto-select | Provider: auto | Agent: research │ │ │ │ Type 'exit' to quit, 'help' for commands, 'models' to switch models │ │ ──────────────────────────────────────────────────────────────────── │ │ │ │ You: Tell me about quantum computing │ │ │ │ MCP [OpenAI:gpt-4]: Quantum computing is a type of computation that │ │ harnesses quantum mechanical phenomena like superposition and │ │ entanglement to process information in ways that classical computers │ │ cannot. │ │ │ │ Unlike classical bits that exist in a state of either 0 or 1, quantum │ │ bits or "qubits" can exist in multiple states simultaneously due to │ │ superposition. This potentially allows quantum computers to explore │ │ multiple solutions to a problem at once. │ │ │ │ [Response continues for several more paragraphs...] │ │ │ │ You: Can you explain quantum entanglement more simply? │ │ │ │ MCP [Ollama:mistral]: █ │ │ │ └─────────────────────────────────────────────────────────────────────────┘


#### Model Management Screen

┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ MCP Models │ │ │ │ AVAILABLE MODELS: │ │ │ │ OpenAI: │ │ [✓] gpt-4-turbo - Advanced reasoning, current knowledge │ │ [✓] gpt-3.5-turbo - Fast, efficient for standard tasks │ │ │ │ Ollama: │ │ [✓] llama2 - General purpose local model │ │ [✓] mistral - Strong reasoning, 8k context window │ │ [✓] codellama - Specialized for code generation │ │ [ ] wizard-math - Mathematical problem-solving │ │ │ │ COMMANDS: │ │ │ │ pull MODEL_NAME - Download a model to Ollama │ │ info MODEL_NAME - Show detailed model information │ │ benchmark MODEL_NAME - Run performance benchmark │ │ set-default MODEL_NAME - Set as default model │ │ │ └─────────────────────────────────────────────────────────────────────────┘


#### Agent Configuration Screen

┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ MCP Agent Configuration │ │ │ │ AVAILABLE AGENTS: │ │ │ │ [✓] general - General purpose assistant │ │ [✓] research - Research specialist with knowledge tools │ │ [✓] coding - Code assistant with tool integration │ │ [✓] creative - Creative writing and content generation │ │ │ │ CUSTOM AGENTS: │ │ │ │ [✓] my-math-tutor - Mathematics teaching and problem solving │ │ [✓] data-analyst - Data analysis with visualization tools │ │ │ │ COMMANDS: │ │ │ │ create NAME - Create a new custom agent │ │ edit NAME - Edit an existing agent │ │ delete NAME - Delete a custom agent │ │ export NAME FILE - Export agent configuration │ │ import FILE - Import agent configuration │ │ │ └─────────────────────────────────────────────────────────────────────────┘


### CLI Interaction Flow

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ │ │ │ │ Start CLI │────▶│ Select Mode │────▶│ Set Config │────▶│ Session │ │ │ │ │ │ │ │ Interaction │ └─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────▼──────┐ │ │ │ │ │ │ │ │ │ Export │◀────│ Session │◀────│ Generate │◀────│ User │ │ Results │ │ Management │ │ Response │ │ Prompt │ │ │ │ │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘


### CLI Implementation Example

```python
# mcp_cli.py
import argparse
import os
import json
import sys
import time
from typing import Dict, Any, List, Optional
import requests
import yaml
import colorama
from colorama import Fore, Style
from prompt_toolkit import PromptSession
from prompt_toolkit.history import FileHistory
from prompt_toolkit.auto_suggest import AutoSuggestFromHistory
from prompt_toolkit.completion import WordCompleter
from rich.console import Console
from rich.markdown import Markdown
from rich.panel import Panel
from rich.progress import Progress

# Initialize colorama for cross-platform color support
colorama.init()
console = Console()

CONFIG_PATH = os.path.expanduser("~/.mcp/config.yaml")
HISTORY_PATH = os.path.expanduser("~/.mcp/history")
API_URL = "http://localhost:8000/api/v1"

def ensure_config_dir():
    """Ensure the config directory exists."""
    config_dir = os.path.dirname(CONFIG_PATH)
    os.makedirs(config_dir, exist_ok=True)
    os.makedirs(os.path.dirname(HISTORY_PATH), exist_ok=True)

def load_config():
    """Load configuration from file."""
    ensure_config_dir()
    
    if not os.path.exists(CONFIG_PATH):
        # Create default config
        config = {
            "api": {
                "url": API_URL,
                "key": None
            },
            "defaults": {
                "model": "auto",
                "provider": "auto",
                "agent": "general"
            },
            "output": {
                "format": "markdown",
                "show_model_info": True
            }
        }
        
        with open(CONFIG_PATH, 'w') as f:
            yaml.dump(config, f, default_flow_style=False)
        
        console.print(f"Created default config at {CONFIG_PATH}", style="yellow")
        return config
    
    with open(CONFIG_PATH, 'r') as f:
        return yaml.safe_load(f)

def save_config(config):
    """Save configuration to file."""
    with open(CONFIG_PATH, 'w') as f:
        yaml.dump(config, f, default_flow_style=False)

def get_api_key(config):
    """Get API key from config or environment."""
    if config["api"]["key"]:
        return config["api"]["key"]
    
    env_key = os.environ.get("MCP_API_KEY")
    if env_key:
        return env_key
    
    # If no key is configured, prompt the user
    console.print("No API key found. Please enter your API key:", style="yellow")
    key = input("> ")
    
    if key:
        config["api"]["key"] = key
        save_config(config)
        return key
    
    console.print("No API key provided. Some features may not work.", style="red")
    return None

def make_api_request(endpoint, method="GET", data=None, config=None):
    """Make an API request to the MCP backend."""
    if config is None:
        config = load_config()
    
    api_key = get_api_key(config)
    headers = {
        "Content-Type": "application/json"
    }
    
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"
    
    url = f"{config['api']['url']}/{endpoint.lstrip('/')}"
    
    try:
        if method == "GET":
            response = requests.get(url, headers=headers)
        elif method == "POST":
            response = requests.post(url, headers=headers, json=data)
        else:
            raise ValueError(f"Unsupported HTTP method: {method}")
        
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        console.print(f"API request failed: {str(e)}", style="red")
        return None

def display_response(response_text, format_type="markdown"):
    """Display a response with appropriate formatting."""
    if format_type == "markdown":
        console.print(Markdown(response_text))
    else:
        console.print(response_text)

def chat_command(args, config):
    """Start an interactive chat session."""
    session_id = args.session_id
    model_name = args.model or config["defaults"]["model"]
    provider = args.provider or config["defaults"]["provider"]
    agent_type = args.agent or config["defaults"]["agent"]
    
    console.print(Panel(f"Starting MCP Chat Session\nModel: {model_name} | Provider: {provider} | Agent: {agent_type}"))
    console.print("Type 'exit' to quit, 'help' for commands", style="dim")
    
    # Set up prompt session with history
    ensure_config_dir()
    history_file = os.path.join(HISTORY_PATH, "chat_history")
    session = PromptSession(
        history=FileHistory(history_file),
        auto_suggest=AutoSuggestFromHistory(),
        completer=WordCompleter(['exit', 'help', 'models', 'clear', 'save', 'switch'])
    )
    
    # Initial session data
    if not session_id:
        # Create a new session
        pass
    
    while True:
        try:
            user_input = session.prompt(f"{Fore.GREEN}You: {Style.RESET_ALL}")
            
            if user_input.lower() in ('exit', 'quit'):
                break
            
            if not user_input.strip():
                continue
            
            # Handle special commands
            if user_input.lower() == 'help':
                console.print(Panel("""
                Available commands:
                - exit/quit: Exit the chat session
                - clear: Clear the current conversation
                - save FILENAME: Save conversation to file
                - models: List available models
                - switch MODEL: Switch to a different model
                - provider PROVIDER: Switch to a different provider
                """))
                continue
            
            # For normal input, send to API
            with Progress() as progress:
                task = progress.add_task("[cyan]Generating response...", total=None)
                
                data = {
                    "message": user_input,
                    "session_id": session_id,
                    "model_params": {
                        "provider": provider,
                        "model": model_name,
                        "auto_select": provider == "auto"
                    }
                }
                
                response = make_api_request("chat", method="POST", data=data, config=config)
                progress.update(task, completed=100)
            
            if response:
                session_id = response["session_id"]
                model_used = response.get("model_used", model_name)
                provider_used = response.get("provider_used", provider)
                
                # Display provider and model info if configured
                if config["output"]["show_model_info"]:
                    console.print(f"\n{Fore.BLUE}MCP [{provider_used}:{model_used}]:{Style.RESET_ALL}")
                else:
                    console.print(f"\n{Fore.BLUE}MCP:{Style.RESET_ALL}")
                
                display_response(response["response"], config["output"]["format"])
                console.print()  # Empty line for readability
        
        except KeyboardInterrupt:
            break
        except EOFError:
            break
        except Exception as e:
            console.print(f"Error: {str(e)}", style="red")
    
    console.print("Chat session ended")

def models_command(args, config):
    """List and manage available models."""
    if args.pull:
        # Pull a new model for Ollama
        console.print(f"Pulling Ollama model: {args.pull}")
        
        with Progress() as progress:
            task = progress.add_task(f"[cyan]Pulling {args.pull}...", total=None)
            
            # This would actually call Ollama API
            time.sleep(2)  # Simulating download
            
            progress.update(task, completed=100)
        
        console.print(f"Successfully pulled {args.pull}", style="green")
        return
    
    # List available models
    console.print(Panel("Available Models"))
    
    console.print("\n[bold]OpenAI Models:[/bold]")
    openai_models = [
        {"name": "gpt-4-turbo", "description": "Advanced reasoning, current knowledge"},
        {"name": "gpt-3.5-turbo", "description": "Fast, efficient for standard tasks"}
    ]
    
    for model in openai_models:
        console.print(f"  • {model['name']} - {model['description']}")
    
    console.print("\n[bold]Ollama Models:[/bold]")
    
    # In a real implementation, this would fetch from Ollama API
    ollama_models = [
        {"name": "llama2", "description": "General purpose local model", "installed": True},
        {"name": "mistral", "description": "Strong reasoning, 8k context window", "installed": True},
        {"name": "codellama", "description": "Specialized for code generation", "installed": True},
        {"name": "wizard-math", "description": "Mathematical problem-solving", "installed": False}
    ]
    
    for model in ollama_models:
        status = "[green]✓[/green]" if model["installed"] else "[red]✗[/red]"
        console.print(f"  {status} {model['name']} - {model['description']}")
    
    console.print("\nUse 'mcp models --pull MODEL_NAME' to download a model")

def config_command(args, config):
    """View or edit configuration."""
    if args.set:
        # Set a configuration value
        key, value = args.set.split('=', 1)
        keys = key.split('.')
        
        # Navigate to the nested key
        current = config
        for k in keys[:-1]:
            if k not in current:
                current[k] = {}
            current = current[k]
        
        # Set the value (with type conversion)
        if value.lower() == 'true':
            current[keys[-1]] = True
        elif value.lower() == 'false':
            current[keys[-1]] = False
        elif value.isdigit():
            current[keys[-1]] = int(value)
        else:
            current[keys[-1]] = value
        
        save_config(config)
        console.print(f"Configuration updated: {key} = {value}", style="green")
        return
    
    # Display current configuration
    console.print(Panel("MCP Configuration"))
    console.print(yaml.dump(config))
    console.print("\nUse 'mcp config --set key.path=value' to change settings")

def agent_command(args, config):
    """Manage agent profiles."""
    if args.create:
        # Create a new agent profile
        console.print(f"Creating agent profile: {args.create}")
        # Implementation would collect agent parameters
        return
    
    if args.edit:
        # Edit an existing agent profile
        console.print(f"Editing agent profile: {args.edit}")
        return
    
    # List available agents
    console.print(Panel("Available Agents"))
    
    console.print("\n[bold]System Agents:[/bold]")
    system_agents = [
        {"name": "general", "description": "General purpose assistant"},
        {"name": "research", "description": "Research specialist with knowledge tools"},
        {"name": "coding", "description": "Code assistant with tool integration"},
        {"name": "creative", "description": "Creative writing and content generation"}
    ]
    
    for agent in system_agents:
        console.print(f"  • {agent['name']} - {agent['description']}")
    
    # In a real implementation, this would load from user config
    custom_agents = [
        {"name": "my-math-tutor", "description": "Mathematics teaching and problem solving"},
        {"name": "data-analyst", "description": "Data analysis with visualization tools"}
    ]
    
    if custom_agents:
        console.print("\n[bold]Custom Agents:[/bold]")
        for agent in custom_agents:
            console.print(f"  • {agent['name']} - {agent['description']}")
    
    console.print("\nUse 'mcp agents --create NAME' to create a new agent")

def main():
    """Main entry point for the CLI."""
    parser = argparse.ArgumentParser(description="MCP Command Line Interface")
    parser.add_argument('--config', help="Path to config file")
    parser.add_argument('--verbose', action='store_true', help="Enable verbose output")
    
    subparsers = parser.add_subparsers(dest='command', help='Command to run')
    
    # Chat command
    chat_parser = subparsers.add_parser('chat', help='Start a chat session')
    chat_parser.add_argument('--model', help='Model to use')
    chat_parser.add_argument('--provider', choices=['openai', 'ollama', 'auto'], help='Provider to use')
    chat_parser.add_argument('--agent', help='Agent type to use')
    chat_parser.add_argument('--session-id', help='Resume an existing session')
    
    # Complete command (one-shot completion)
    complete_parser = subparsers.add_parser('complete', help='Get a completion for a prompt')
    complete_parser.add_argument('prompt', help='Prompt text')
    complete_parser.add_argument('--model', help='Model to use')
    complete_parser.add_argument('--provider', choices=['openai', 'ollama', 'auto'], help='Provider to use')
    
    # Models command
    models_parser = subparsers.add_parser('models', help='List and manage available models')
    models_parser.add_argument('--pull', metavar='MODEL_NAME', help='Download a model to Ollama')
    models_parser.add_argument('--info', metavar='MODEL_NAME', help='Show detailed model information')
    models_parser.add_argument('--benchmark', metavar='MODEL_NAME', help='Run performance benchmark')
    
    # Config command
    config_parser = subparsers.add_parser('config', help='Configure MCP settings')
    config_parser.add_argument('--set', metavar='KEY=VALUE', help='Set a configuration value')
    
    # Agents command
    agents_parser = subparsers.add_parser('agents', help='Manage agent profiles')
    agents_parser.add_argument('--create', metavar='NAME', help='Create a new custom agent')
    agents_parser.add_argument('--edit', metavar='NAME', help='Edit an existing agent')
    agents_parser.add_argument('--delete', metavar='NAME', help='Delete a custom agent')
    
    # Session command
    session_parser = subparsers.add_parser('session', help='Manage saved sessions')
    session_parser.add_argument('--list', action='store_true', help='List saved sessions')
    session_parser.add_argument('--delete', metavar='SESSION_ID', help='Delete a session')
    session_parser.add_argument('--export', metavar='SESSION_ID', help='Export a session')
    
    args = parser.parse_args()
    
    # Load configuration
    config_path = args.config if args.config else CONFIG_PATH
    
    if args.config and not os.path.exists(args.config):
        console.print(f"Config file not found: {args.config}", style="red")
        return 1
    
    config = load_config()
    
    # Execute the appropriate command
    if args.command == 'chat':
        chat_command(args, config)
    elif args.command == 'complete':
        # Implementation for complete command
        pass
    elif args.command == 'models':
        models_command(args, config)
    elif args.command == 'config':
        config_command(args, config)
    elif args.command == 'agents':
        agent_command(args, config)
    elif args.command == 'session':
        # Implementation for session command
        pass
    else:
        # No command specified, show help
        parser.print_help()
    
    return 0

if __name__ == "__main__":
    sys.exit(main())

Web Interface Design

Web Interface Architecture

┌────────────────────────────────────────────────────────────────────┐
│                                                                    │
│  React Frontend                                                    │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│  │ Chat         │ │ Model        │ │ Agent        │ │ Settings  │ │
│  │ Interface    │ │ Management   │ │ Configuration│ │ Manager   │ │
│  └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │
│          │               │                │               │        │
│          └───────────────┼────────────────┼───────────────┘        │
│                          │                │                        │
│                          ▼                ▼                        │
│                    ┌─────────────┐  ┌────────────┐                │
│                    │ Auth        │  │ API Client │                │
│                    │ Management  │  │            │                │
│                    └─────────────┘  └────────────┘                │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌────────────────────────────────────────────────────────────────────┐
│                                                                    │
│  FastAPI Backend                                                   │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│  │ Chat         │ │ Model        │ │ Agent        │ │ User      │ │
│  │ Controller   │ │ Controller   │ │ Controller   │ │ Controller│ │
│  └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │
│          │               │                │               │        │
│          └───────────────┼────────────────┼───────────────┘        │
│                          │                │                        │
│                          ▼                ▼                        │
│              ┌───────────────────┐  ┌────────────────────┐        │
│              │ Provider Service  │  │ Agent Factory      │        │
│              └───────────────────┘  └────────────────────┘        │
│                       │                       │                   │
│                       ▼                       ▼                   │
│               ┌─────────────┐         ┌─────────────┐            │
│               │ OpenAI API  │         │ Ollama API  │            │
│               └─────────────┘         └─────────────┘            │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Web Interface Wireframes

Chat Interface

┌─────────────────────────────────────────────────────────────────────────┐
│ MCP Assistant                                           🔄 New Chat  ⚙️  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────┐  ┌───────────────────────────────────────┐ │
│  │ Chat History            │  │                                       │ │
│  │                         │  │ User: Tell me about quantum computing │ │
│  │ Welcome                 │  │                                       │ │
│  │ Quantum Computing       │  │ MCP: Quantum computing is a type of   │ │
│  │ AI Ethics               │  │ computation that harnesses quantum    │ │
│  │ Python Tutorial         │  │ mechanical phenomena like super-      │ │
│  │                         │  │ position and entanglement.           │ │
│  │                         │  │                                       │ │
│  │                         │  │ Unlike classical bits that represent  │ │
│  │                         │  │ either 0 or 1, quantum bits or        │ │
│  │                         │  │ "qubits" can exist in multiple states │ │
│  │                         │  │ simultaneously due to superposition.  │ │
│  │                         │  │                                       │ │
│  │                         │  │ [Response continues...]               │ │
│  │                         │  │                                       │ │
│  │                         │  │ User: How does quantum entanglement   │ │
│  │                         │  │ work?                                 │ │
│  │                         │  │                                       │ │
│  │                         │  │ MCP is typing...                      │ │
│  │                         │  │                                       │ │
│  └─────────────────────────┘  └───────────────────────────────────────┘ │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ Type your message...                                      Send ▶ │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  Model: auto (OpenAI:gpt-4) | Mode: Research | Memory: Enabled          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Model Settings Panel

┌─────────────────────────────────────────────────────────────────────────┐
│ MCP Assistant > Settings > Models                                   ✖    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Model Selection                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ ● Auto-select model (recommended)                               │    │
│  │ ○ Specify model and provider                                    │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  Provider                     Model                                     │
│  ┌────────────┐               ┌────────────────────┐                    │
│  │ OpenAI   ▼ │               │ gpt-4-turbo      ▼ │                    │
│  └────────────┘               └────────────────────┘                    │
│                                                                         │
│  Auto-Selection Preferences                                             │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ Prioritize:  ● Speed   ○ Quality   ○ Privacy   ○ Cost           │    │
│  │                                                                  │    │
│  │ Complexity threshold: ███████████░░░░░░░░░  0.65                 │    │
│  │                                                                  │    │
│  │ [✓] Prefer Ollama for privacy-sensitive content                  │    │
│  │ [✓] Use OpenAI for complex reasoning                            │    │
│  │ [✓] Automatically fall back if a provider fails                  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  Available Ollama Models                                                │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ ✓ llama2         ✓ mistral        ✓ codellama                   │    │
│  │ ✓ wizard-math    ✓ neural-chat    ○ llama2:70b  [Download]      │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  [ Save Changes ]         [ Cancel ]                                    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Agent Configuration Panel

┌─────────────────────────────────────────────────────────────────────────┐
│ MCP Assistant > Settings > Agents                                   ✖    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Current Agent: Research Assistant                             [Edit ✏] │
│                                                                         │
│  Agent Library                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ ● Research Assistant    Knowledge-focused with search capability│    │
│  │ ○ Code Assistant        Specialized for software development    │    │
│  │ ○ Creative Writer       Content creation and storytelling       │    │
│  │ ○ Math Tutor            Step-by-step problem solving            │    │
│  │ ○ General Assistant     Versatile helper for everyday tasks     │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  Agent Capabilities                                                     │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ [✓] Knowledge retrieval      [ ] Code execution                  │    │
│  │ [✓] Web search              [ ] Data visualization              │    │
│  │ [✓] Memory                  [ ] File operations                 │    │
│  │ [✓] Calendar awareness      [ ] Email integration               │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  System Instructions                                                    │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ You are a research assistant with expertise in finding and       │    │
│  │ synthesizing information. Provide comprehensive, accurate        │    │
│  │ answers with authoritative sources when available.               │    │
│  │                                                                  │    │
│  │                                                                  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  [ Save Agent ]   [ Create New Agent ]   [ Import ]   [ Export ]        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Dashboard View

┌─────────────────────────────────────────────────────────────────────────┐
│ MCP Assistant > Dashboard                                        ⚙️      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  System Status                                   Last 24 Hours          │
│  ┌────────────────────────────┐   ┌────────────────────────────────┐    │
│  │ OpenAI: ● Connected        │   │ Requests: 143                  │    │
│  │ Ollama:  ● Connected       │   │ OpenAI: 62% | Ollama: 38%      │    │
│  │ Database: ● Operational    │   │ Avg Response Time: 2.4s        │    │
│  └────────────────────────────┘   └────────────────────────────────┘    │
│                                                                         │
│  Recent Conversations                                                   │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ ● Quantum Computing Research       Today, 14:32   [Resume]      │    │
│  │ ● Python Code Debugging           Today, 10:15   [Resume]      │    │
│  │ ● Travel Planning                  Yesterday      [Resume]      │    │
│  │ ● Financial Analysis               2 days ago     [Resume]      │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  Model Usage                          Agent Usage                       │
│  ┌────────────────────────────┐   ┌────────────────────────────────┐    │
│  │ ███ OpenAI:gpt-4      27%  │   │ ███ Research Assistant    42%  │    │
│  │ ███ OpenAI:gpt-3.5    35%  │   │ ███ Code Assistant       31%  │    │
│  │ ███ Ollama:mistral    20%  │   │ ███ General Assistant    18%  │    │
│  │ ███ Ollama:llama2     18%  │   │ ███ Other                 9%  │    │
│  └────────────────────────────┘   └────────────────────────────────┘    │
│                                                                         │
│  API Credits                                                            │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ OpenAI: $4.32 used this month of $10.00 budget  ████░░░░░ 43%   │    │
│  │ Estimated savings from Ollama usage: $3.87                      │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  [ New Chat ]   [ View All Conversations ]   [ System Settings ]        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Web Interface Interaction Flow

┌──────────────┐     ┌───────────────┐     ┌────────────────┐
│              │     │               │     │                │
│  Login Page  │────▶│  Dashboard    │────▶│  Chat Interface│◀───┐
│              │     │               │     │                │    │
└──────────────┘     └───────┬───────┘     └────────┬───────┘    │
                             │                      │            │
                             ▼                      ▼            │
                     ┌───────────────┐     ┌────────────────┐    │
                     │               │     │                │    │
                     │Settings Panel │     │ User Message   │    │
                     │               │     │                │    │
                     └───┬───────────┘     └────────┬───────┘    │
                         │                          │            │
                         ▼                          ▼            │
                ┌────────────────┐         ┌────────────────┐    │
                │                │         │                │    │
                │Model Settings  │         │API Processing  │    │
                │                │         │                │    │
                └────────┬───────┘         └────────┬───────┘    │
                         │                          │            │
                         ▼                          ▼            │
                ┌────────────────┐         ┌────────────────┐    │
                │                │         │                │    │
                │Agent Settings  │         │System Response │────┘
                │                │         │                │
                └────────────────┘         └────────────────┘

Key Web Components

ProviderSelector Component

JSX
// ProviderSelector.jsx
import React, { useState, useEffect } from 'react';
import { Dropdown, Switch, Slider, Checkbox, Button, Card, Alert } from 'antd';
import { ApiOutlined, SettingOutlined, QuestionCircleOutlined } from '@ant-design/icons';

const ProviderSelector = ({ 
  onProviderChange, 
  onModelChange,
  initialProvider = 'auto',
  initialModel = null,
  showAdvanced = false
}) => {
  const [provider, setProvider] = useState(initialProvider);
  const [model, setModel] = useState(initialModel);
  const [autoSelect, setAutoSelect] = useState(initialProvider === 'auto');
  const [complexityThreshold, setComplexityThreshold] = useState(0.65);
  const [prioritizePrivacy, setPrioritizePrivacy] = useState(false);
  const [ollamaModels, setOllamaModels] = useState([]);
  const [ollamaStatus, setOllamaStatus] = useState('unknown'); // 'online', 'offline', 'unknown'
  const [openaiModels, setOpenaiModels] = useState([
    { value: 'gpt-4o', label: 'GPT-4o' },
    { value: 'gpt-4-turbo', label: 'GPT-4 Turbo' },
    { value: 'gpt-3.5-turbo', label: 'GPT-3.5 Turbo' }
  ]);
  
  // Fetch available Ollama models on component mount
  useEffect(() => {
    const fetchOllamaModels = async () => {
      try {
        const response = await fetch('/api/v1/models/ollama');
        if (response.ok) {
          const data = await response.json();
          setOllamaModels(data.models.map(m => ({ 
            value: m.name, 
            label: m.name 
          })));
          setOllamaStatus('online');
        } else {
          setOllamaStatus('offline');
        }
      } catch (error) {
        console.error('Error fetching Ollama models:', error);
        setOllamaStatus('offline');
      }
    };
    
    fetchOllamaModels();
  }, []);
  
  const handleProviderChange = (value) => {
    setProvider(value);
    onProviderChange(value);
    
    // Reset model when changing provider
    setModel(null);
    onModelChange(null);
  };
  
  const handleModelChange = (value) => {
    setModel(value);
    onModelChange(value);
  };
  
  const handleAutoSelectChange = (checked) => {
    setAutoSelect(checked);
    if (checked) {
      setProvider('auto');
      onProviderChange('auto');
      setModel(null);
      onModelChange(null);
    } else {
      // Default to OpenAI if disabling auto-select
      setProvider('openai');
      onProviderChange('openai');
      setModel('gpt-3.5-turbo');
      onModelChange('gpt-3.5-turbo');
    }
  };
  
  const providerOptions = [
    { value: 'openai', label: 'OpenAI' },
    { value: 'ollama', label: 'Ollama (Local)' },
    { value: 'auto', label: 'Auto-select' }
  ];
  
  return (
    }>
      
        
          
          
            {autoSelect ? 'Automatically select the best model for each query' : 'Manually choose provider and model'}
          
        
        
        {!autoSelect && (
          
            
              Provider:
              
            
            
            
              Model:
              
            
          
        )}
        
        {provider === 'ollama' && ollamaStatus === 'offline' && (
          
        )}
        
        {showAdvanced && (
          
            Advanced Routing Settings
            
            
              Complexity threshold:
              
              {complexityThreshold}
            
            
            
               setPrioritizePrivacy(e.target.checked)}
                disabled={!autoSelect}
              >
                Prioritize privacy (prefer Ollama for sensitive content)
              
            
            
            
              
                 OpenAI: Connected
              
              
                 Ollama: 
                  {ollamaStatus === 'online' ? 'Connected' : 'Disconnected'}
                
              
            
          
        )}
      
    
  );
};

export default ProviderSelector;

ChatInterface Component

JSX
// ChatInterface.jsx
import React, { useState, useEffect, useRef } from 'react';
import { Input, Button, Spin, Avatar, Tooltip, Card, Typography, Dropdown, Menu } from 'antd';
import { SendOutlined, UserOutlined, RobotOutlined, SettingOutlined, 
         SaveOutlined, CopyOutlined, DeleteOutlined, InfoCircleOutlined } from '@ant-design/icons';
import ReactMarkdown from 'react-markdown';
import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter';
import { tomorrow } from 'react-syntax-highlighter/dist/esm/styles/prism';
import ProviderSelector from './ProviderSelector';

const { TextArea } = Input;
const { Text, Title } = Typography;

const ChatInterface = () => {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [loading, setLoading] = useState(false);
  const [sessionId, setSessionId] = useState(null);
  const [provider, setProvider] = useState('auto');
  const [model, setModel] = useState(null);
  const [showSettings, setShowSettings] = useState(false);
  const messagesEndRef = useRef(null);
  
  // Scroll to bottom when messages change
  useEffect(() => {
    scrollToBottom();
  }, [messages]);
  
  const scrollToBottom = () => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  };
  
  const handleSend = async () => {
    if (!input.trim()) return;
    
    // Add user message to chat
    const userMessage = { role: 'user', content: input, timestamp: new Date() };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setLoading(true);
    
    try {
      const response = await fetch('/api/v1/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          message: input,
          session_id: sessionId,
          model_params: {
            provider: provider,
            model: model,
            auto_select: provider === 'auto'
          }
        })
      });
      
      if (!response.ok) {
        throw new Error('Failed to get response');
      }
      
      const data = await response.json();
      
      // Update session ID if new
      if (data.session_id && !sessionId) {
        setSessionId(data.session_id);
      }
      
      // Add assistant message to chat
      const assistantMessage = { 
        role: 'assistant', 
        content: data.response, 
        timestamp: new Date(),
        metadata: {
          model_used: data.model_used,
          provider_used: data.provider_used
        }
      };
      
      setMessages(prev => [...prev, assistantMessage]);
      
    } catch (error) {
      console.error('Error sending message:', error);
      // Add error message
      setMessages(prev => [...prev, { 
        role: 'system', 
        content: 'Error: Unable to get a response. Please try again.',
        error: true,
        timestamp: new Date()
      }]);
    } finally {
      setLoading(false);
    }
  };
  
  const handleKeyDown = (e) => {
    if (e.key === 'Enter' && !e.shiftKey) {
      e.preventDefault();
      handleSend();
    }
  };
  
  const handleCopyMessage = (content) => {
    navigator.clipboard.writeText(content);
    // Could show a toast notification here
  };
  
  const renderMessage = (message, index) => {
    const isUser = message.role === 'user';
    const isError = message.error;
    
    return (
      
        
           : } 
            style={{ backgroundColor: isUser ? '#1890ff' : '#52c41a' }}
          />
        
        
        
          
            {isUser ? 'You' : 'MCP Assistant'}
            {message.metadata && (
              
                
                   {message.metadata.provider_used}:{message.metadata.model_used}
                
              
            )}
            
              {message.timestamp.toLocaleTimeString()}
            
          
          
          
            
                  ) : (
                    
                      {children}
                    
                  );
                }
              }}
            />
          
          
          
            
          
        
      
    );
  };
  
  const settingsMenu = (
    
      Chat Settings
      
      
      
      
        
      
    
  );
  
  return (
    
      
        MCP Assistant
        
        
          
          
        
      
      
      {showSettings && settingsMenu}
      
      
        {messages.length === 0 && (
          
            Start a conversation
            Ask a question or request information
          
        )}
        
        {messages.map(renderMessage)}
        
        {loading && (
          
            
              } style={{ backgroundColor: '#52c41a' }} />
            
            
              
                 MCP is thinking...
              
            
          
        )}
        
        
      
      
      
         setInput(e.target.value)}
          onKeyDown={handleKeyDown}
          placeholder="Type your message..."
          autoSize={{ minRows: 1, maxRows: 4 }}
          disabled={loading}
        />
        <Button 
          type="primary" 
          icon={<SendOutlined />} 
          onClick={handleSend}
          disabled={loading || !input.trim()}
        >
          Send
        </Button>
      </div>
      
      <div className="chat-footer">
        <Text type="secondary">
          Model: {provider === 'auto' ? 'Auto-select' : `${provider}:${model || 'default'}`}
        </Text>
        {sessionId && (
          <Text type="secondary">Session ID: {sessionId}</Text>
        )}
      </div>
    </div>
  );
};

export default ChatInterface;
</code></pre></div></pre>
<h4 id="agentconfiguration-component">AgentConfiguration Component</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSX</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-jsx">// AgentConfiguration.jsx
import React, { useState, useEffect } from 'react';
import { Form, Input, Button, Select, Checkbox, Card, Typography, Tabs, message } from 'antd';
import { SaveOutlined, PlusOutlined, ImportOutlined, ExportOutlined } from '@ant-design/icons';

const { Title, Text } = Typography;
const { TextArea } = Input;
const { Option } = Select;
const { TabPane } = Tabs;

const AgentConfiguration = () => {
  const [form] = Form.useForm();
  const [agents, setAgents] = useState([]);
  const [currentAgent, setCurrentAgent] = useState(null);
  const [loading, setLoading] = useState(false);
  
  // Fetch available agents on component mount
  useEffect(() => {
    const fetchAgents = async () => {
      setLoading(true);
      try {
        const response = await fetch('/api/v1/agents');
        if (response.ok) {
          const data = await response.json();
          setAgents(data.agents);
          
          // Set current agent to the first one
          if (data.agents.length > 0) {
            setCurrentAgent(data.agents[0]);
            form.setFieldsValue(data.agents[0]);
          }
        }
      } catch (error) {
        console.error('Error fetching agents:', error);
        message.error('Failed to load agents');
      } finally {
        setLoading(false);
      }
    };
    
    fetchAgents();
  }, [form]);
  
  const handleAgentChange = (agentId) => {
    const selected = agents.find(a => a.id === agentId);
    if (selected) {
      setCurrentAgent(selected);
      form.setFieldsValue(selected);
    }
  };
  
  const handleSaveAgent = async (values) => {
    setLoading(true);
    try {
      const response = await fetch(`/api/v1/agents/${currentAgent.id}`, {
        method: 'PUT',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(values)
      });
      
      if (response.ok) {
        message.success('Agent configuration saved');
        // Update local state
        const updatedAgents = agents.map(a => 
          a.id === currentAgent.id ? { ...a, ...values } : a
        );
        setAgents(updatedAgents);
        setCurrentAgent({ ...currentAgent, ...values });
      } else {
        message.error('Failed to save agent configuration');
      }
    } catch (error) {
      console.error('Error saving agent:', error);
      message.error('Error saving agent configuration');
    } finally {
      setLoading(false);
    }
  };
  
  const handleCreateAgent = () => {
    form.resetFields();
    form.setFieldsValue({
      name: 'New Agent',
      description: 'Custom assistant',
      capabilities: [],
      system_prompt: 'You are a helpful assistant.'
    });
    
    setCurrentAgent(null); // Indicates we're creating a new agent
  };
  
  const handleExportAgent = () => {
    if (!currentAgent) return;
    
    const agentData = JSON.stringify(currentAgent, null, 2);
    const blob = new Blob([agentData], { type: 'application/json' });
    const url = URL.createObjectURL(blob);
    
    const a = document.createElement('a');
    a.href = url;
    a.download = `${currentAgent.name.replace(/\s+/g, '_').toLowerCase()}_agent.json`;
    document.body.appendChild(a);
    a.click();
    document.body.removeChild(a);
    URL.revokeObjectURL(url);
  };
  
  return (
    <div className="agent-configuration">
      <Card title={<Title level={4}>Agent Configuration</Title>}>
        <div className="agent-actions">
          <Button 
            type="primary" 
            icon={<PlusOutlined />} 
            onClick={handleCreateAgent}
          >
            Create New Agent
          </Button>
          
          <Button 
            icon={<ExportOutlined />} 
            onClick={handleExportAgent}
            disabled={!currentAgent}
          >
            Export
          </Button>
          
          <Button icon={<ImportOutlined />}>
            Import
          </Button>
        </div>
        
        <div className="agent-selector">
          <Text strong>Select Agent:</Text>
          <Select
            style={{ width: 300 }}
            onChange={handleAgentChange}
            value={currentAgent?.id}
            loading={loading}
          >
            {agents.map(agent => (
              <Option key={agent.id} value={agent.id}>
                {agent.name} - {agent.description}
              </Option>
            ))}
          </Select>
        </div>
        
        <Form
          form={form}
          layout="vertical"
          onFinish={handleSaveAgent}
          className="agent-form"
        >
          <Tabs defaultActiveKey="basic">
            <TabPane tab="Basic Information" key="basic">
              <Form.Item
                name="name"
                label="Agent Name"
                rules={[{ required: true, message: 'Please enter an agent name' }]}
              >
                <Input placeholder="Agent name" />
              </Form.Item>
              
              <Form.Item
                name="description"
                label="Description"
                rules={[{ required: true, message: 'Please enter a description' }]}
              >
                <Input placeholder="Brief description of this agent's purpose" />
              </Form.Item>
              
              <Form.Item
                name="system_prompt"
                label="System Instructions"
                rules={[{ required: true, message: 'Please enter system instructions' }]}
              >
                <TextArea
                  placeholder="Instructions that define the agent's behavior"
                  autoSize={{ minRows: 4, maxRows: 8 }}
                />
              </Form.Item>
            </TabPane>
            
            <TabPane tab="Capabilities" key="capabilities">
              <Form.Item name="capabilities" label="Agent Capabilities">
                <Checkbox.Group>
                  <div className="capabilities-grid">
                    <Checkbox value="knowledge_retrieval">Knowledge Retrieval</Checkbox>
                    <Checkbox value="web_search">Web Search</Checkbox>
                    <Checkbox value="memory">Long-term Memory</Checkbox>
                    <Checkbox value="calendar">Calendar Awareness</Checkbox>
                    <Checkbox value="code_execution">Code Execution</Checkbox>
                    <Checkbox value="data_visualization">Data Visualization</Checkbox>
                    <Checkbox value="file_operations">File Operations</Checkbox>
                    <Checkbox value="email">Email Integration</Checkbox>
                  </div>
                </Checkbox.Group>
              </Form.Item>
              
              <Form.Item name="preferred_models" label="Preferred Models">
                <Select mode="multiple" placeholder="Select preferred models">
                  <Option value="openai:gpt-4">OpenAI: GPT-4</Option>
                  <Option value="openai:gpt-3.5-turbo">OpenAI: GPT-3.5 Turbo</Option>
                  <Option value="ollama:llama2">Ollama: Llama2</Option>
                  <Option value="ollama:mistral">Ollama: Mistral</Option>
                  <Option value="ollama:codellama">Ollama: CodeLlama</Option>
                </Select>
              </Form.Item>
            </TabPane>
            
            <TabPane tab="Advanced" key="advanced">
              <Form.Item name="tool_configuration" label="Tool Configuration">
                <TextArea
                  placeholder="JSON configuration for tools (advanced)"
                  autoSize={{ minRows: 4, maxRows: 8 }}
                />
              </Form.Item>
              
              <Form.Item name="temperature" label="Temperature">
                <Select placeholder="Response creativity level">
                  <Option value="0.2">0.2 - More deterministic/factual</Option>
                  <Option value="0.5">0.5 - Balanced</Option>
                  <Option value="0.8">0.8 - More creative/varied</Option>
                </Select>
              </Form.Item>
            </TabPane>
          </Tabs>
          
          <Form.Item>
            <Button 
              type="primary" 
              htmlType="submit" 
              icon={<SaveOutlined />}
              loading={loading}
            >
              {currentAgent ? 'Save Changes' : 'Create Agent'}
            </Button>
          </Form.Item>
        </Form>
      </Card>
    </div>
  );
};

export default AgentConfiguration;
</code></pre></div></pre>
<h2 id="user-interaction-flows">User Interaction Flows</h2>
<h3 id="new-user-onboarding-flow">New User Onboarding Flow</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">┌────────────────┐     ┌────────────────┐     ┌────────────────┐
│                │     │                │     │                │
│ Welcome Screen │────▶│ Initial Setup  │────▶│ API Key Setup  │
│                │     │                │     │                │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                      │
┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐
│                │     │                │     │                │
│  First Chat    │◀────│  Ollama Setup  │◀────│ Model Download │
│                │     │                │     │                │
└────────────────┘     └────────────────┘     └────────────────┘
</code></pre>
<h3 id="task-based-user-flow-example">Task-Based User Flow Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">┌────────────────┐     ┌────────────────┐     ┌────────────────┐
│                │     │                │     │                │
│  Start Chat    │────▶│ Select Research│────▶│ Enter Research │
│                │     │     Agent      │     │    Query       │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                      │
┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐
│                │     │                │     │                │
│  Save Results  │◀────│  Refine Query  │◀────│ View Response  │
│                │     │                │     │ (Using OpenAI) │
└────────────────┘     └────────────────┘     └────────────────┘
</code></pre>
<h3 id="advanced-settings-flow">Advanced Settings Flow</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">┌────────────────┐     ┌────────────────┐     ┌────────────────┐
│                │     │                │     │                │
│  Chat Screen   │────▶│ Settings Menu  │────▶│ Model Settings │
│                │     │                │     │                │
└────────────────┘     └────────────────┘     └───────┬────────┘
                                                      │
┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐
│                │     │                │     │                │
│  Return to     │◀────│ Save Settings  │◀────│ Agent Settings │
│    Chat        │     │                │     │                │
└────────────────┘     └────────────────┘     └────────────────┘
</code></pre>
<h2 id="implementation-recommendations">Implementation Recommendations</h2>
<ol>
<li><strong>Responsive Design:</strong> Ensure the web interface is mobile-friendly using responsive design principles</li>
<li><strong>Accessibility:</strong> Implement proper ARIA attributes and keyboard navigation for accessibility</li>
<li><strong>Progressive Enhancement:</strong> Build with a progressive enhancement approach where core functionality works without JavaScript</li>
<li><strong>State Management:</strong> Use context API or Redux for global state in more complex implementations</li>
<li><strong>Offline Support:</strong> Consider adding service workers for offline functionality in the web interface</li>
<li><strong>CLI Shortcuts:</strong> Implement tab completion and command history in the CLI for improved usability</li>
</ol>
<h2 id="conclusion-4">Conclusion</h2>
<p>The proposed user interface designs for the MCP system provide a balance between simplicity and power, enabling users to leverage the hybrid OpenAI-Ollama architecture effectively. The CLI offers a lightweight, scriptable interface for technical users and automation scenarios, while the web interface provides a rich, interactive experience for broader adoption.</p>
<p>Both interfaces expose the key capabilities of the system:</p>
<ol>
<li><strong>Intelligent Model Routing:</strong> Users can leverage automatic model selection or manually choose specific models</li>
<li><strong>Agent Specialization:</strong> Configurable agents enable task-specific optimization</li>
<li><strong>Privacy Controls:</strong> Explicit options for privacy-sensitive content</li>
<li><strong>Performance Analytics:</strong> Visibility into system usage, costs, and efficiency</li>
</ol>
<p>These interfaces serve as the critical touchpoint between users and the sophisticated underlying architecture, making complex AI capabilities accessible and manageable.</p>
<h1 id="optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system">Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System</h1>
<h2 id="strategic-optimization-framework">Strategic Optimization Framework</h2>
<p>The integration of cloud-based and local inference capabilities within a unified architecture presents unique opportunities for optimization across multiple dimensions. This document outlines comprehensive strategies for enhancing performance, reducing operational costs, and improving response accuracy, followed by detailed deployment methodologies for both local and cloud environments.</p>
<h2 id="performance-optimization-strategies">Performance Optimization Strategies</h2>
<h3 id="1-query-routing-optimization">1. Query Routing Optimization</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/routing_optimizer.py
import logging
import numpy as np
from typing import Dict, List, Any, Optional
from app.config import settings

logger = logging.getLogger(__name__)

class RoutingOptimizer:
    """Optimizes routing decisions based on historical performance data."""
    
    def __init__(self, cache_size: int = 1000):
        self.performance_history = {}
        self.cache_size = cache_size
        self.learning_rate = 0.05
        
        # Baseline thresholds
        self.complexity_threshold = settings.COMPLEXITY_THRESHOLD
        self.token_threshold = 800  # Approximate tokens before preferring cloud
        self.latency_requirement = 2.0  # Seconds
        
        # Performance weights
        self.weights = {
            "complexity": 0.4,
            "token_count": 0.2,
            "privacy_score": 0.3,
            "tool_requirement": 0.1
        }
    
    def update_performance_metrics(self, 
                                  provider: str, 
                                  model: str,
                                  query_complexity: float, 
                                  token_count: int,
                                  response_time: float,
                                  success: bool) -> None:
        """Update performance metrics based on actual results."""
        model_key = f"{provider}:{model}"
        
        if model_key not in self.performance_history:
            self.performance_history[model_key] = {
                "queries": 0,
                "avg_response_time": 0,
                "success_rate": 0,
                "complexity_performance": {}  # Maps complexity ranges to success/time
            }
        
        metrics = self.performance_history[model_key]
        
        # Update metrics with exponential moving average
        metrics["queries"] += 1
        metrics["avg_response_time"] = (
            (1 - self.learning_rate) * metrics["avg_response_time"] + 
            self.learning_rate * response_time
        )
        
        # Update success rate
        old_success_rate = metrics["success_rate"]
        queries = metrics["queries"]
        metrics["success_rate"] = (old_success_rate * (queries - 1) + (1 if success else 0)) / queries
        
        # Update complexity-specific performance
        complexity_bin = round(query_complexity * 10) / 10  # Round to nearest 0.1
        
        if complexity_bin not in metrics["complexity_performance"]:
            metrics["complexity_performance"][complexity_bin] = {
                "count": 0,
                "avg_time": 0,
                "success_rate": 0
            }
            
        bin_metrics = metrics["complexity_performance"][complexity_bin]
        bin_metrics["count"] += 1
        bin_metrics["avg_time"] = (
            (bin_metrics["count"] - 1) * bin_metrics["avg_time"] + response_time
        ) / bin_metrics["count"]
        
        bin_metrics["success_rate"] = (
            (bin_metrics["count"] - 1) * bin_metrics["success_rate"] + (1 if success else 0)
        ) / bin_metrics["count"]
        
        # Prune cache if needed
        if len(self.performance_history) > self.cache_size:
            # Remove least used models
            sorted_models = sorted(
                self.performance_history.items(),
                key=lambda x: x[1]["queries"]
            )
            for i in range(len(self.performance_history) - self.cache_size):
                if i < len(sorted_models):
                    del self.performance_history[sorted_models[i][0]]
    
    def optimize_thresholds(self) -> None:
        """Periodically optimize routing thresholds based on collected metrics."""
        if not self.performance_history:
            return
        
        openai_models = [k for k in self.performance_history if k.startswith("openai:")]
        ollama_models = [k for k in self.performance_history if k.startswith("ollama:")]
        
        if not openai_models or not ollama_models:
            return  # Need data from both providers
        
        # Calculate average performance metrics for each provider
        openai_avg_time = np.mean([
            self.performance_history[model]["avg_response_time"] 
            for model in openai_models
        ])
        ollama_avg_time = np.mean([
            self.performance_history[model]["avg_response_time"] 
            for model in ollama_models
        ])
        
        # Find optimal complexity threshold by analyzing where Ollama begins to struggle
        complexity_success_rates = {}
        
        for model in ollama_models:
            for complexity, metrics in self.performance_history[model]["complexity_performance"].items():
                if complexity not in complexity_success_rates:
                    complexity_success_rates[complexity] = []
                complexity_success_rates[complexity].append(metrics["success_rate"])
        
        # Find the complexity level where Ollama success rate drops significantly
        optimal_threshold = self.complexity_threshold  # Start with current
        
        if complexity_success_rates:
            complexities = sorted(complexity_success_rates.keys())
            avg_success_rates = [
                np.mean(complexity_success_rates[c]) for c in complexities
            ]
            
            # Find first major drop in success rate
            for i in range(1, len(complexities)):
                if (avg_success_rates[i-1] - avg_success_rates[i]) > 0.15:  # 15% drop
                    optimal_threshold = complexities[i-1]
                    break
            
            # If no clear drop, look for when it falls below 85%
            if optimal_threshold == self.complexity_threshold:
                for i, c in enumerate(complexities):
                    if avg_success_rates[i] < 0.85:
                        optimal_threshold = c
                        break
        
        # Update thresholds (with dampening to avoid oscillation)
        self.complexity_threshold = (
            0.8 * self.complexity_threshold + 
            0.2 * optimal_threshold
        )
        
        # Update latency requirements based on current performance
        self.latency_requirement = max(1.0, min(ollama_avg_time * 1.2, 5.0))
        
        logger.info(f"Optimized routing thresholds: complexity={self.complexity_threshold:.2f}, latency={self.latency_requirement:.2f}s")
    
    def get_optimal_provider(self, 
                           query_complexity: float,
                           privacy_score: float,
                           estimated_tokens: int,
                           requires_tools: bool) -> str:
        """Get the optimal provider based on current metrics and query characteristics."""
        # Calculate weighted score for routing decision
        openai_score = 0
        ollama_score = 0
        
        # Complexity factor
        if query_complexity > self.complexity_threshold:
            openai_score += self.weights["complexity"]
        else:
            ollama_score += self.weights["complexity"]
        
        # Token count factor
        if estimated_tokens > self.token_threshold:
            openai_score += self.weights["token_count"]
        else:
            ollama_score += self.weights["token_count"]
        
        # Privacy factor (higher privacy score means more sensitive)
        if privacy_score > 0.5:
            ollama_score += self.weights["privacy_score"]
        else:
            # Split proportionally
            ollama_privacy = self.weights["privacy_score"] * privacy_score * 2
            openai_privacy = self.weights["privacy_score"] * (1 - privacy_score * 2)
            ollama_score += ollama_privacy
            openai_score += openai_privacy
            
        # Tool requirements factor
        if requires_tools:
            openai_score += self.weights["tool_requirement"]
        
        # Return the provider with higher score
        return "openai" if openai_score > ollama_score else "ollama"
</code></pre></div></pre>
<h3 id="2-response-caching-with-semantic-search">2. Response Caching with Semantic Search</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/cache_service.py
import time
import hashlib
import json
from typing import Dict, List, Any, Optional, Tuple
import numpy as np
from scipy.spatial.distance import cosine
import aioredis

from app.config import settings
from app.services.embedding_service import EmbeddingService

class SemanticCache:
    """Intelligent caching system using semantic similarity."""
    
    def __init__(self, embedding_service: EmbeddingService, ttl: int = 3600):
        self.embedding_service = embedding_service
        self.redis = None
        self.ttl = ttl
        self.similarity_threshold = 0.92  # Threshold for semantic similarity
        self.exact_cache_enabled = True
        self.semantic_cache_enabled = True
    
    async def initialize(self):
        """Initialize Redis connection."""
        self.redis = await aioredis.create_redis_pool(settings.REDIS_URL)
    
    async def close(self):
        """Close Redis connection."""
        if self.redis:
            self.redis.close()
            await self.redis.wait_closed()
    
    def _get_exact_cache_key(self, messages: List[Dict], provider: str, model: str) -> str:
        """Generate an exact cache key from request parameters."""
        # Normalize the request to ensure consistent keys
        normalized = {
            "messages": messages,
            "provider": provider,
            "model": model
        }
        serialized = json.dumps(normalized, sort_keys=True)
        return f"exact:{hashlib.md5(serialized.encode()).hexdigest()}"
    
    async def _get_embedding_key(self, text: str) -> str:
        """Get the embedding key for a text string."""
        return f"emb:{hashlib.md5(text.encode()).hexdigest()}"
    
    async def _store_embedding(self, text: str, embedding: List[float]) -> None:
        """Store an embedding in Redis."""
        key = await self._get_embedding_key(text)
        await self.redis.set(key, json.dumps(embedding), expire=self.ttl)
    
    async def _get_embedding(self, text: str) -> Optional[List[float]]:
        """Get an embedding from Redis or compute it if not found."""
        key = await self._get_embedding_key(text)
        cached = await self.redis.get(key)
        
        if cached:
            return json.loads(cached)
        
        # Generate new embedding
        embedding = await self.embedding_service.get_embedding(text)
        if embedding:
            await self._store_embedding(text, embedding)
        
        return embedding
    
    async def _compute_similarity(self, embedding1: List[float], embedding2: List[float]) -> float:
        """Compute cosine similarity between embeddings."""
        return 1 - cosine(embedding1, embedding2)
    
    async def get(self, messages: List[Dict], provider: str, model: str) -> Optional[Dict]:
        """Get a cached response if available."""
        if not self.redis:
            return None
            
        # Try exact match first
        if self.exact_cache_enabled:
            exact_key = self._get_exact_cache_key(messages, provider, model)
            cached = await self.redis.get(exact_key)
            if cached:
                return json.loads(cached)
        
        # Try semantic search if enabled
        if self.semantic_cache_enabled:
            # Extract query text (last user message)
            query_text = None
            for msg in reversed(messages):
                if msg.get("role") == "user" and msg.get("content"):
                    query_text = msg["content"]
                    break
            
            if not query_text:
                return None
            
            # Get embedding for query
            query_embedding = await self._get_embedding(query_text)
            if not query_embedding:
                return None
            
            # Get all semantic cache keys
            semantic_keys = await self.redis.keys("semantic:*")
            if not semantic_keys:
                return None
            
            # Find most similar cached query
            best_match = None
            best_similarity = 0
            
            for key in semantic_keys:
                # Get metadata
                meta_key = f"{key}:meta"
                meta_data = await self.redis.get(meta_key)
                if not meta_data:
                    continue
                
                meta = json.loads(meta_data)
                cached_embedding = meta.get("embedding")
                
                if not cached_embedding:
                    continue
                
                # Check provider/model compatibility
                if (provider != "auto" and meta.get("provider") != provider) or \
                   (model and meta.get("model") != model):
                    continue
                
                # Compute similarity
                similarity = await self._compute_similarity(query_embedding, cached_embedding)
                
                if similarity > self.similarity_threshold and similarity > best_similarity:
                    best_match = key
                    best_similarity = similarity
            
            if best_match:
                cached = await self.redis.get(best_match)
                if cached:
                    # Record cache hit analytics
                    await self.redis.incr("stats:semantic_cache_hits")
                    return json.loads(cached)
        
        # Record cache miss
        await self.redis.incr("stats:cache_misses")
        return None
    
    async def set(self, messages: List[Dict], provider: str, model: str, response: Dict) -> None:
        """Set a response in the cache."""
        if not self.redis:
            return
            
        # Set exact match cache
        if self.exact_cache_enabled:
            exact_key = self._get_exact_cache_key(messages, provider, model)
            await self.redis.set(exact_key, json.dumps(response), expire=self.ttl)
        
        # Set semantic cache
        if self.semantic_cache_enabled:
            # Extract query text (last user message)
            query_text = None
            for msg in reversed(messages):
                if msg.get("role") == "user" and msg.get("content"):
                    query_text = msg["content"]
                    break
            
            if not query_text:
                return
            
            # Get embedding for query
            query_embedding = await self._get_embedding(query_text)
            if not query_embedding:
                return
            
            # Generate semantic key
            semantic_key = f"semantic:{time.time()}:{hashlib.md5(query_text.encode()).hexdigest()}"
            
            # Store response
            await self.redis.set(semantic_key, json.dumps(response), expire=self.ttl)
            
            # Store metadata (for similarity search)
            meta_data = {
                "query": query_text,
                "embedding": query_embedding,
                "provider": response.get("provider", provider),
                "model": response.get("model", model),
                "timestamp": time.time()
            }
            
            await self.redis.set(f"{semantic_key}:meta", json.dumps(meta_data), expire=self.ttl)
    
    async def get_stats(self) -> Dict[str, int]:
        """Get cache statistics."""
        if not self.redis:
            return {"hits": 0, "misses": 0, "semantic_hits": 0}
            
        exact_hits = int(await self.redis.get("stats:exact_cache_hits") or 0)
        semantic_hits = int(await self.redis.get("stats:semantic_cache_hits") or 0)
        misses = int(await self.redis.get("stats:cache_misses") or 0)
        
        return {
            "exact_hits": exact_hits,
            "semantic_hits": semantic_hits,
            "total_hits": exact_hits + semantic_hits,
            "misses": misses,
            "hit_rate": (exact_hits + semantic_hits) / (exact_hits + semantic_hits + misses) if (exact_hits + semantic_hits + misses) > 0 else 0
        }
</code></pre></div></pre>
<h3 id="3-parallel-query-processing">3. Parallel Query Processing</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/parallel_processor.py
import asyncio
from typing import List, Dict, Any, Optional, Tuple
import logging
import time

from app.services.provider_service import ProviderService
from app.config import settings

logger = logging.getLogger(__name__)

class ParallelProcessor:
    """Processes complex queries by decomposing and running in parallel."""
    
    def __init__(self, provider_service: ProviderService):
        self.provider_service = provider_service
        # Threshold for when to use parallel processing
        self.complexity_threshold = 0.8
        self.parallel_enabled = settings.ENABLE_PARALLEL_PROCESSING
    
    async def should_process_in_parallel(self, messages: List[Dict]) -> bool:
        """Determine if a query should be processed in parallel."""
        if not self.parallel_enabled:
            return False
            
        # Get the last user message
        user_message = None
        for msg in reversed(messages):
            if msg.get("role") == "user":
                user_message = msg.get("content", "")
                break
        
        if not user_message:
            return False
            
        # Check message length
        if len(user_message.split()) < 50:
            return False
            
        # Check for complexity indicators
        complexity_markers = [
            "compare", "analyze", "different perspectives", "pros and cons",
            "multiple aspects", "detail", "comprehensive", "multifaceted"
        ]
        
        marker_count = sum(1 for marker in complexity_markers if marker in user_message.lower())
        
        # Check for multiple questions
        question_count = user_message.count("?")
        
        # Calculate complexity score
        complexity = (marker_count * 0.15) + (question_count * 0.2) + (len(user_message.split()) / 500)
        
        return complexity > self.complexity_threshold
    
    async def decompose_query(self, query: str) -> List[str]:
        """Decompose a complex query into simpler sub-queries."""
        # Use the provider service to generate the decomposition
        decompose_messages = [
            {"role": "system", "content": """
            You are a query decomposition specialist. Your job is to break down complex questions into 
            simpler, independent sub-questions that can be answered separately and then combined.
            
            Return a JSON array of strings, where each string is a sub-question.
            For example: ["What are the basics of quantum computing?", "How does quantum computing differ from classical computing?"]
            
            Keep the total number of sub-questions between 2 and 5.
            """},
            {"role": "user", "content": f"Decompose this complex query into simpler sub-questions: {query}"}
        ]
        
        try:
            response = await self.provider_service.generate_completion(
                messages=decompose_messages,
                provider="openai",  # Use OpenAI for decomposition
                model="gpt-3.5-turbo", # Use a faster model for this task
                response_format={"type": "json_object"}
            )
            
            if response and response.get("message", {}).get("content"):
                import json
                result = json.loads(response["message"]["content"])
                if isinstance(result, list) and all(isinstance(item, str) for item in result):
                    return result
                elif isinstance(result, dict) and "sub_questions" in result:
                    return result["sub_questions"]
            
            # Fallback to simple decomposition
            return [query]
            
        except Exception as e:
            logger.error(f"Error decomposing query: {str(e)}")
            # Fallback to simple decomposition
            return [query]
    
    async def process_sub_query(self, sub_query: str, provider: str, model: str) -> Dict[str, Any]:
        """Process a single sub-query."""
        messages = [{"role": "user", "content": sub_query}]
        
        start_time = time.time()
        response = await self.provider_service.generate_completion(
            messages=messages,
            provider=provider,
            model=model
        )
        duration = time.time() - start_time
        
        return {
            "query": sub_query,
            "response": response,
            "content": response.get("message", {}).get("content", ""),
            "duration": duration
        }
    
    async def synthesize_responses(self, 
                                 original_query: str, 
                                 sub_results: List[Dict]) -> str:
        """Synthesize the responses from sub-queries into a cohesive answer."""
        # Extract the responses
        synthesize_prompt = f"""
        Original question: {original_query}
        
        I've broken this question down into parts and found the following information:
        
        {
            ''.join([f"Sub-question: {r['query']}\nAnswer: {r['content']}\n\n" for r in sub_results])
        }
        
        Please synthesize this information into a cohesive, comprehensive answer to the original question.
        Ensure the response is well-structured and flows naturally as if it were answering the original
        question directly. Maintain a consistent tone throughout.
        """
        
        messages = [
            {"role": "system", "content": "You are an expert at synthesizing information from multiple sources into cohesive, comprehensive answers."},
            {"role": "user", "content": synthesize_prompt}
        ]
        
        try:
            response = await self.provider_service.generate_completion(
                messages=messages,
                provider="openai",  # Use OpenAI for synthesis
                model="gpt-4"  # Use a more capable model for synthesis
            )
            
            if response and response.get("message", {}).get("content"):
                return response["message"]["content"]
            
            # Fallback
            return "\n\n".join([r['content'] for r in sub_results])
        
        except Exception as e:
            logger.error(f"Error synthesizing responses: {str(e)}")
            # Fallback to simple concatenation
            return "\n\n".join([f"Regarding '{r['query']}':\n{r['content']}" for r in sub_results])
    
    async def process_in_parallel(self, 
                                messages: List[Dict], 
                                provider: str = "auto", 
                                model: str = None) -> Dict[str, Any]:
        """Process a complex query by breaking it down and processing in parallel."""
        # Get the last user message
        user_message = None
        for msg in reversed(messages):
            if msg.get("role") == "user":
                user_message = msg.get("content", "")
                break
        
        if not user_message:
            # Fallback to regular processing
            return await self.provider_service.generate_completion(
                messages=messages,
                provider=provider,
                model=model
            )
        
        # Decompose the query
        sub_queries = await self.decompose_query(user_message)
        
        if len(sub_queries) <= 1:
            # Not complex enough to benefit from parallel processing
            return await self.provider_service.generate_completion(
                messages=messages,
                provider=provider,
                model=model
            )
        
        # Process sub-queries in parallel
        tasks = [
            self.process_sub_query(query, provider, model)
            for query in sub_queries
        ]
        
        sub_results = await asyncio.gather(*tasks)
        
        # Synthesize the results
        final_content = await self.synthesize_responses(user_message, sub_results)
        
        # Calculate aggregated metrics
        total_duration = sum(result["duration"] for result in sub_results)
        providers_used = [result["response"].get("provider") for result in sub_results 
                         if result["response"].get("provider")]
        models_used = [result["response"].get("model") for result in sub_results 
                      if result["response"].get("model")]
        
        # Construct a response in the same format as provider_service.generate_completion
        return {
            "id": f"parallel_{int(time.time())}",
            "object": "chat.completion",
            "created": int(time.time()),
            "model": ", ".join(set(models_used)) if models_used else model,
            "provider": ", ".join(set(providers_used)) if providers_used else provider,
            "usage": {
                "prompt_tokens": sum(result["response"].get("usage", {}).get("prompt_tokens", 0) 
                                  for result in sub_results),
                "completion_tokens": sum(result["response"].get("usage", {}).get("completion_tokens", 0) 
                                      for result in sub_results),
                "total_tokens": sum(result["response"].get("usage", {}).get("total_tokens", 0) 
                                 for result in sub_results)
            },
            "message": {
                "role": "assistant",
                "content": final_content
            },
            "parallel_processing": {
                "sub_queries": len(sub_queries),
                "total_duration": total_duration,
                "max_duration": max(result["duration"] for result in sub_results),
                "processing_efficiency": 1 - (max(result["duration"] for result in sub_results) / total_duration) 
                                        if total_duration > 0 else 0
            }
        }
</code></pre></div></pre>
<h3 id="4-dynamic-batching-for-high-load-scenarios">4. Dynamic Batching for High-Load Scenarios</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/batch_processor.py
import asyncio
from typing import List, Dict, Any, Optional, Callable, Awaitable
import time
import logging
from collections import deque

logger = logging.getLogger(__name__)

class RequestBatcher:
    """
    Dynamically batches requests to optimize throughput under high load.
    """
    
    def __init__(self, 
                max_batch_size: int = 4,
                max_wait_time: float = 0.1,
                processor_fn: Optional[Callable] = None):
        self.max_batch_size = max_batch_size
        self.max_wait_time = max_wait_time
        self.processor_fn = processor_fn
        self.queue = deque()
        self.batch_task = None
        self.active = False
        self.stats = {
            "total_requests": 0,
            "total_batches": 0,
            "avg_batch_size": 0,
            "max_queue_length": 0
        }
    
    async def start(self):
        """Start the batch processor."""
        if self.active:
            return
            
        self.active = True
        self.batch_task = asyncio.create_task(self._batch_processor())
        logger.info("Batch processor started")
    
    async def stop(self):
        """Stop the batch processor."""
        if not self.active:
            return
            
        self.active = False
        if self.batch_task:
            try:
                self.batch_task.cancel()
                await self.batch_task
            except asyncio.CancelledError:
                pass
        
        logger.info("Batch processor stopped")
    
    async def _batch_processor(self):
        """Background task to process batches."""
        while self.active:
            try:
                # Process any batches in the queue
                await self._process_next_batch()
                
                # Wait a small amount of time before checking again
                await asyncio.sleep(0.01)
            except Exception as e:
                logger.error(f"Error in batch processor: {str(e)}")
                await asyncio.sleep(1)  # Wait longer on error
    
    async def _process_next_batch(self):
        """Process the next batch from the queue."""
        if not self.queue:
            return
            
        # Start timing from oldest request
        oldest_request_time = self.queue[0][2]
        current_time = time.time()
        
        # Process if we have max batch size or max wait time elapsed
        if len(self.queue) >= self.max_batch_size or \
           (current_time - oldest_request_time) >= self.max_wait_time:
            
            # Extract batch (up to max_batch_size)
            batch_size = min(len(self.queue), self.max_batch_size)
            batch = []
            
            for _ in range(batch_size):
                request, future, _ = self.queue.popleft()
                batch.append((request, future))
            
            # Update stats
            self.stats["total_batches"] += 1
            self.stats["avg_batch_size"] = ((self.stats["avg_batch_size"] * (self.stats["total_batches"] - 1)) + batch_size) / self.stats["total_batches"]
            
            # Process batch
            asyncio.create_task(self._process_batch(batch))
    
    async def _process_batch(self, batch: List[tuple]):
        """Process a batch of requests."""
        if not self.processor_fn:
            for _, future in batch:
                if not future.done():
                    future.set_exception(ValueError("No processor function set"))
            return
        
        # Extract just the requests for processing
        requests = [req for req, _ in batch]
        
        try:
            # Process the batch
            results = await self.processor_fn(requests)
            
            # Match results to futures
            if results and len(results) == len(batch):
                for i, (_, future) in enumerate(batch):
                    if not future.done():
                        future.set_result(results[i])
            else:
                # Handle mismatch in results
                logger.error(f"Batch result count mismatch: {len(results)} results for {len(batch)} requests")
                for _, future in batch:
                    if not future.done():
                        future.set_exception(ValueError("Batch processing error: result count mismatch"))
                        
        except Exception as e:
            logger.error(f"Error processing batch: {str(e)}")
            # Set exception for all futures in batch
            for _, future in batch:
                if not future.done():
                    future.set_exception(e)
    
    async def submit(self, request: Any) -> Any:
        """Submit a request for batched processing."""
        self.stats["total_requests"] += 1
        
        # Create future for this request
        future = asyncio.Future()
        
        # Add to queue with timestamp
        self.queue.append((request, future, time.time()))
        
        # Update max queue length stat
        queue_length = len(self.queue)
        if queue_length > self.stats["max_queue_length"]:
            self.stats["max_queue_length"] = queue_length
        
        # Return future
        return await future
</code></pre></div></pre>
<h3 id="5-model-specific-prompt-optimization">5. Model-Specific Prompt Optimization</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/prompt_optimizer.py
import logging
from typing import List, Dict, Any, Optional
import re

logger = logging.getLogger(__name__)

class PromptOptimizer:
    """Optimizes prompts for specific models to improve response quality and reduce token usage."""
    
    def __init__(self):
        self.model_specific_templates = {
            # OpenAI models
            "gpt-4": {
                "prefix": "",  # GPT-4 doesn't need special prefixing
                "suffix": "",
                "instruction_format": "{instruction}"
            },
            "gpt-3.5-turbo": {
                "prefix": "",
                "suffix": "",
                "instruction_format": "{instruction}"
            },
            
            # Ollama models - they benefit from more explicit formatting
            "llama2": {
                "prefix": "",
                "suffix": "Think step-by-step and be thorough in your response.",
                "instruction_format": "{instruction}"
            },
            "llama2:70b": {
                "prefix": "",
                "suffix": "",
                "instruction_format": "{instruction}"
            },
            "mistral": {
                "prefix": "",
                "suffix": "Take a deep breath and work on this step-by-step.",
                "instruction_format": "{instruction}"
            },
            "codellama": {
                "prefix": "You are an expert programmer with years of experience. ",
                "suffix": "Make sure your code is correct and efficient.",
                "instruction_format": "Task: {instruction}"
            },
            "wizard-math": {
                "prefix": "You are a mathematics expert. ",
                "suffix": "Show your work step-by-step and explain your reasoning clearly.",
                "instruction_format": "Problem: {instruction}"
            }
        }
        
        # Default template to use when model not specifically defined
        self.default_template = {
            "prefix": "",
            "suffix": "",
            "instruction_format": "{instruction}"
        }
        
        # Task-specific optimizations
        self.task_templates = {
            "code_generation": {
                "prefix": "You are an expert programmer. ",
                "suffix": "Ensure your code is correct, efficient, and well-commented.",
                "instruction_format": "Programming Task: {instruction}"
            },
            "creative_writing": {
                "prefix": "You are a creative writer with excellent storytelling abilities. ",
                "suffix": "",
                "instruction_format": "Creative Writing Prompt: {instruction}"
            },
            "reasoning": {
                "prefix": "You are a logical thinker with strong reasoning skills. ",
                "suffix": "Think step-by-step and be precise in your analysis.",
                "instruction_format": "Reasoning Task: {instruction}"
            },
            "math": {
                "prefix": "You are a mathematics expert. ",
                "suffix": "Show your work step-by-step with explanations.",
                "instruction_format": "Math Problem: {instruction}"
            }
        }
    
    def detect_task_type(self, message: str) -> Optional[str]:
        """Detect the type of task from the message content."""
        message_lower = message.lower()
        
        # Code detection patterns
        code_patterns = [
            r"write (a|an|the)?\s?(code|function|program|script|class|method)",
            r"implement (a|an|the)?\s?(algorithm|function|class|method)",
            r"debug (this|the)?\s?(code|function|program)",
            r"(js|javascript|python|java|c\+\+|go|rust|typescript)"
        ]
        
        # Creative writing patterns
        creative_patterns = [
            r"write (a|an|the)?\s?(story|poem|essay|narrative|scene)",
            r"create (a|an|the)?\s?(story|character|dialogue|setting)",
            r"describe (a|an|the)?\s?(scene|character|setting|world)"
        ]
        
        # Math patterns
        math_patterns = [
            r"calculate",
            r"solve (this|the)?\s?(equation|problem|expression)",
            r"compute",
            r"what is (the)?\s?(value|result|answer)",
            r"find (the)?\s?(derivative|integral|product|sum|limit)"
        ]
        
        # Reasoning patterns
        reasoning_patterns = [
            r"analyze",
            r"compare (and|&) contrast",
            r"explain (why|how)",
            r"what are (the)?\s?(pros|cons|advantages|disadvantages)",
            r"evaluate"
        ]
        
        # Check each pattern set
        for pattern in code_patterns:
            if re.search(pattern, message_lower):
                return "code_generation"
                
        for pattern in creative_patterns:
            if re.search(pattern, message_lower):
                return "creative_writing"
                
        for pattern in math_patterns:
            if re.search(pattern, message_lower):
                return "math"
                
        for pattern in reasoning_patterns:
            if re.search(pattern, message_lower):
                return "reasoning"
        
        return None
    
    def optimize_system_prompt(self, original_prompt: str, model: str, task_type: Optional[str] = None) -> str:
        """Optimize the system prompt for the specific model and task."""
        # If no original prompt, return an appropriate default
        if not original_prompt:
            return "You are a helpful assistant. Provide accurate, detailed, and clear responses."
        
        # Get model-specific template
        template = self.model_specific_templates.get(model, self.default_template)
        
        # If task type is provided, incorporate task-specific optimizations
        if task_type and task_type in self.task_templates:
            task_template = self.task_templates[task_type]
            
            # Merge templates, with task template taking precedence for non-empty values
            merged_template = {
                "prefix": task_template["prefix"] if task_template["prefix"] else template["prefix"],
                "suffix": task_template["suffix"] if task_template["suffix"] else template["suffix"],
                "instruction_format": task_template["instruction_format"]
            }
            
            template = merged_template
        
        # Apply template
        optimized_prompt = f"{template['prefix']}{original_prompt}"
        
        # Add suffix if it doesn't appear to already be present
        if template["suffix"] and template["suffix"] not in optimized_prompt:
            optimized_prompt += f" {template['suffix']}"
        
        return optimized_prompt
    
    def optimize_user_prompt(self, original_prompt: str, model: str, task_type: Optional[str] = None) -> str:
        """Optimize the user prompt for the specific model and task."""
        if not original_prompt:
            return original_prompt
            
        # Auto-detect task type if not provided
        if not task_type:
            task_type = self.detect_task_type(original_prompt)
        
        # Get model-specific template
        template = self.model_specific_templates.get(model, self.default_template)
        
        # If task type is provided, incorporate task-specific optimizations
        if task_type and task_type in self.task_templates:
            task_template = self.task_templates[task_type]
            # Use task instruction format if available
            instruction_format = task_template["instruction_format"]
        else:
            instruction_format = template["instruction_format"]
        
        # Apply instruction format if the prompt doesn't already look formatted
        if "{instruction}" in instruction_format and not re.match(r"^(task|problem|prompt|question):", original_prompt.lower()):
            formatted_prompt = instruction_format.replace("{instruction}", original_prompt)
            return formatted_prompt
        
        return original_prompt
    
    def optimize_messages(self, messages: List[Dict[str, str]], model: str) -> List[Dict[str, str]]:
        """Optimize all messages in a conversation for the specific model."""
        if not messages:
            return messages
            
        # Try to detect task type from the user messages
        task_type = None
        for msg in messages:
            if msg.get("role") == "user" and msg.get("content"):
                detected_task = self.detect_task_type(msg["content"])
                if detected_task:
                    task_type = detected_task
                    break
        
        optimized = []
        
        for msg in messages:
            role = msg.get("role", "")
            content = msg.get("content", "")
            
            if role == "system" and content:
                optimized_content = self.optimize_system_prompt(content, model, task_type)
                optimized.append({"role": role, "content": optimized_content})
            elif role == "user" and content:
                optimized_content = self.optimize_user_prompt(content, model, task_type)
                optimized.append({"role": role, "content": optimized_content})
            else:
                # Keep other messages unchanged
                optimized.append(msg)
        
        return optimized
</code></pre></div></pre>
<h2 id="cost-reduction-strategies">Cost Reduction Strategies</h2>
<h3 id="1-token-usage-optimization">1. Token Usage Optimization</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/token_optimizer.py
import logging
import re
from typing import List, Dict, Any, Optional, Tuple
import tiktoken
import numpy as np

logger = logging.getLogger(__name__)

class TokenOptimizer:
    """Optimizes token usage to reduce costs."""
    
    def __init__(self):
        # Load tokenizers once
        try:
            self.gpt3_tokenizer = tiktoken.encoding_for_model("gpt-3.5-turbo")
            self.gpt4_tokenizer = tiktoken.encoding_for_model("gpt-4")
        except Exception as e:
            logger.warning(f"Could not load tokenizers: {str(e)}. Falling back to approximate counting.")
            self.gpt3_tokenizer = None
            self.gpt4_tokenizer = None
    
    def count_tokens(self, text: str, model: str = "gpt-3.5-turbo") -> int:
        """Count the number of tokens in a text string for a specific model."""
        if not text:
            return 0
            
        # Use appropriate tokenizer if available
        if model.startswith("gpt-4") and self.gpt4_tokenizer:
            return len(self.gpt4_tokenizer.encode(text))
        elif model.startswith("gpt-3") and self.gpt3_tokenizer:
            return len(self.gpt3_tokenizer.encode(text))
        
        # Fallback to approximation (~ 4 chars per token for English)
        return len(text) // 4 + 1
    
    def count_message_tokens(self, messages: List[Dict[str, str]], model: str = "gpt-3.5-turbo") -> int:
        """Count tokens in a full message array."""
        if not messages:
            return 0
            
        total = 0
        
        # Different models have different message formatting overheads
        if model.startswith("gpt-3.5-turbo"):
            # Per OpenAI's formula for message token counting
            # See: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
            total += 3  # Every message follows <im_start>{role/name}\n{content}<im_end>\n
            
            for message in messages:
                total += 3  # Role overhead
                for key, value in message.items():
                    if key == "name":  # Name is 1 token
                        total += 1
                    if key == "content" and value:
                        total += self.count_tokens(value, model)
            
            total += 3  # Assistant response overhead
            
        elif model.startswith("gpt-4"):
            # Similar formula for GPT-4
            total += 3
            
            for message in messages:
                total += 3
                for key, value in message.items():
                    if key == "name":
                        total += 1
                    if key == "content" and value:
                        total += self.count_tokens(value, model)
            
            total += 3
            
        else:
            # Simple approach for other models 
            for message in messages:
                content = message.get("content", "")
                if content:
                    total += self.count_tokens(content, model)
        
        return total
    
    def truncate_messages(self, 
                         messages: List[Dict[str, str]], 
                         max_tokens: int, 
                         model: str = "gpt-3.5-turbo",
                         preserve_system: bool = True,
                         preserve_last_n_exchanges: int = 2) -> List[Dict[str, str]]:
        """Truncate conversation history to fit within token limit."""
        if not messages:
            return messages
            
        # Clone messages to avoid modifying the original
        messages = [m.copy() for m in messages]
        
        current_tokens = self.count_message_tokens(messages, model)
        
        # If already under the limit, return as is
        if current_tokens <= max_tokens:
            return messages
        
        # Identify system and user/assistant pairs
        system_messages = [m for m in messages if m.get("role") == "system"]
        system_tokens = sum(self.count_tokens(m.get("content", ""), model) for m in system_messages)
        
        # Extract exchanges (user followed by assistant message)
        exchanges = []
        current_exchange = []
        
        for m in messages:
            if m.get("role") == "system":
                continue
                
            current_exchange.append(m)
            
            # If we have a user+assistant pair, add to exchanges and reset
            if len(current_exchange) == 2 and current_exchange[0].get("role") == "user" and current_exchange[1].get("role") == "assistant":
                exchanges.append(current_exchange)
                current_exchange = []
                
        # Add any remaining messages
        if current_exchange:
            exchanges.append(current_exchange)
        
        # Calculate tokens needed for essential parts
        essential_tokens = system_tokens if preserve_system else 0
        
        # Add tokens for the last N exchanges
        last_n_exchanges = exchanges[-preserve_last_n_exchanges:] if exchanges else []
        last_n_tokens = sum(
            self.count_tokens(m.get("content", ""), model) 
            for exchange in last_n_exchanges 
            for m in exchange
        )
        
        essential_tokens += last_n_tokens
        
        # If essential parts already exceed the limit, we need more aggressive truncation
        if essential_tokens > max_tokens:
            logger.warning(f"Essential conversation parts exceed token limit: {essential_tokens} > {max_tokens}")
            
            # Start by keeping system messages if requested
            result = system_messages.copy() if preserve_system else []
            
            # Add as many of the last exchanges as we can fit
            remaining_tokens = max_tokens - sum(self.count_tokens(m.get("content", ""), model) for m in result)
            
            for exchange in reversed(last_n_exchanges):
                exchange_tokens = sum(self.count_tokens(m.get("content", ""), model) for m in exchange)
                
                if exchange_tokens <= remaining_tokens:
                    result.extend(exchange)
                    remaining_tokens -= exchange_tokens
                else:
                    # If we can't fit the whole exchange, try truncating the assistant response
                    if len(exchange) == 2:
                        user_msg = exchange[0]
                        assistant_msg = exchange[1].copy()
                        
                        user_tokens = self.count_tokens(user_msg.get("content", ""), model)
                        
                        if user_tokens < remaining_tokens:
                            # We can include the user message
                            result.append(user_msg)
                            remaining_tokens -= user_tokens
                            
                            # Truncate the assistant message to fit
                            assistant_content = assistant_msg.get("content", "")
                            if assistant_content:
                                # Simple truncation - in a real system, you'd want more intelligent truncation
                                chars_to_keep = int(remaining_tokens * 4)  # Approximate char count
                                truncated_content = assistant_content[:chars_to_keep] + "... [truncated]"
                                assistant_msg["content"] = truncated_content
                                result.append(assistant_msg)
                    
                    break
            
            # Resort the messages to maintain the correct order
            result.sort(key=lambda m: messages.index(m) if m in messages else 999999)
            return result
        
        # If we get here, we can keep all essential parts and need to drop from the middle
        result = system_messages.copy() if preserve_system else []
        middle_exchanges = exchanges[:-preserve_last_n_exchanges] if len(exchanges) > preserve_last_n_exchanges else []
        
        # Calculate how many tokens we can allocate to middle exchanges
        remaining_tokens = max_tokens - essential_tokens
        
        # Add exchanges from the middle, newest first, until we run out of tokens
        for exchange in reversed(middle_exchanges):
            exchange_tokens = sum(self.count_tokens(m.get("content", ""), model) for m in exchange)
            
            if exchange_tokens <= remaining_tokens:
                result.extend(exchange)
                remaining_tokens -= exchange_tokens
            else:
                break
        
        # Add the preserved last exchanges
        for exchange in last_n_exchanges:
            result.extend(exchange)
        
        # Sort messages to maintain the correct order
        result.sort(key=lambda m: messages.index(m) if m in messages else 999999)
        
        # Verify the result is within the token limit
        final_tokens = self.count_message_tokens(result, model)
        if final_tokens > max_tokens:
            logger.warning(f"Truncation failed to meet target: {final_tokens} > {max_tokens}")
        
        return result
    
    def compress_system_prompt(self, system_prompt: str, max_tokens: int, model: str = "gpt-3.5-turbo") -> str:
        """Compress a system prompt to use fewer tokens while preserving key information."""
        current_tokens = self.count_tokens(system_prompt, model)
        
        if current_tokens <= max_tokens:
            return system_prompt
        
        # Use a language model to compress the prompt
        # In a real implementation, you might want to call an external service
        
        # Fallback compression strategy: Use text summarization techniques
        # 1. Remove redundant phrases
        redundant_phrases = [
            "Please note that", "It's important to remember that", "Keep in mind that",
            "I want you to", "I'd like you to", "You should", "Make sure to",
            "Always", "Never", "Remember to"
        ]
        
        compressed = system_prompt
        for phrase in redundant_phrases:
            compressed = compressed.replace(phrase, "")
        
        # 2. Replace verbose constructions with shorter ones
        replacements = {
            "in order to": "to",
            "for the purpose of": "for",
            "due to the fact that": "because",
            "in the event that": "if",
            "on the condition that": "if",
            "with regard to": "about",
            "in relation to": "about"
        }
        
        for verbose, concise in replacements.items():
            compressed = compressed.replace(verbose, concise)
        
        # 3. Remove unnecessary whitespace
        compressed = re.sub(r'\s+', ' ', compressed).strip()
        
        # 4. If still over the limit, truncate with an ellipsis
        compressed_tokens = self.count_tokens(compressed, model)
        if compressed_tokens > max_tokens:
            # Approximation: 4 characters per token
            char_limit = max_tokens * 4
            compressed = compressed[:char_limit] + "..."
        
        return compressed
    
    def optimize_messages_for_cost(self, 
                                 messages: List[Dict[str, str]], 
                                 model: str, 
                                 max_tokens: int = 4096) -> List[Dict[str, str]]:
        """Fully optimize messages for cost efficiency."""
        if not messages:
            return messages
            
        # 1. First, identify system messages for compression
        system_messages = []
        other_messages = []
        
        for msg in messages:
            if msg.get("role") == "system":
                system_messages.append(msg)
            else:
                other_messages.append(msg)
        
        # 2. Compress system messages if there are multiple
        if len(system_messages) > 1:
            # Combine multiple system messages
            combined_content = " ".join(msg.get("content", "") for msg in system_messages)
            compressed_content = self.compress_system_prompt(combined_content, 1024, model)
            
            # Replace with a single compressed message
            system_messages = [{"role": "system", "content": compressed_content}]
        elif len(system_messages) == 1 and self.count_tokens(system_messages[0].get("content", ""), model) > 1024:
            # Compress a single long system message
            system_messages[0]["content"] = self.compress_system_prompt(
                system_messages[0].get("content", ""), 1024, model
            )
        
        # 3. Recombine and truncate the full conversation
        optimized = system_messages + other_messages
        reserved_completion_tokens = max(max_tokens // 4, 1024)  # Reserve 25% or at least 1024 tokens for completion
        max_prompt_tokens = max_tokens - reserved_completion_tokens
        
        return self.truncate_messages(
            optimized, 
            max_prompt_tokens, 
            model,
            preserve_system=True,
            preserve_last_n_exchanges=2
        )
</code></pre></div></pre>
<h3 id="2-model-tier-selection">2. Model Tier Selection</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/model_tier_service.py
import logging
from typing import Dict, List, Any, Optional, Tuple
import re
import time

from app.config import settings

logger = logging.getLogger(__name__)

class ModelTierService:
    """Selects the appropriate model tier based on task requirements and budget constraints."""
    
    def __init__(self):
        # Cost per 1000 tokens for different models (approximate)
        self.model_costs = {
            # OpenAI models input/output costs
            "gpt-4": {"input": 0.03, "output": 0.06},
            "gpt-4-32k": {"input": 0.06, "output": 0.12},
            "gpt-4-turbo": {"input": 0.01, "output": 0.03},
            "gpt-3.5-turbo": {"input": 0.0015, "output": 0.002},
            "gpt-3.5-turbo-16k": {"input": 0.003, "output": 0.004},
            
            # Ollama models (local, so effectively zero API cost)
            "llama2": {"input": 0, "output": 0},
            "mistral": {"input": 0, "output": 0},
            "codellama": {"input": 0, "output": 0}
        }
        
        # Model capabilities and appropriate use cases
        self.model_capabilities = {
            "gpt-4": ["complex_reasoning", "creative", "code", "math", "general"],
            "gpt-4-turbo": ["complex_reasoning", "creative", "code", "math", "general"],
            "gpt-3.5-turbo": ["simple_reasoning", "general", "summarization"],
            "llama2": ["simple_reasoning", "general", "summarization"],
            "mistral": ["simple_reasoning", "general", "creative"],
            "codellama": ["code"]
        }
        
        # Default model selections for different task types
        self.task_model_mapping = {
            "complex_reasoning": {
                "high": "gpt-4-turbo",
                "medium": "gpt-4-turbo",
                "low": "gpt-3.5-turbo"
            },
            "simple_reasoning": {
                "high": "gpt-3.5-turbo",
                "medium": "gpt-3.5-turbo",
                "low": "mistral"
            },
            "creative": {
                "high": "gpt-4-turbo",
                "medium": "mistral",
                "low": "mistral"
            },
            "code": {
                "high": "gpt-4-turbo",
                "medium": "codellama",
                "low": "codellama"
            },
            "math": {
                "high": "gpt-4-turbo",
                "medium": "gpt-3.5-turbo",
                "low": "mistral"
            },
            "general": {
                "high": "gpt-3.5-turbo",
                "medium": "mistral",
                "low": "llama2"
            },
            "summarization": {
                "high": "gpt-3.5-turbo",
                "medium": "mistral",
                "low": "llama2"
            }
        }
        
        # Budget tier thresholds - what percentage of budget is remaining?
        self.budget_tiers = {
            "high": 0.6,    # >60% of budget remaining
            "medium": 0.3,  # 30-60% of budget remaining
            "low": 0.0      # <30% of budget remaining
        }
        
        # Initialize usage tracking
        self.monthly_budget = settings.MONTHLY_BUDGET
        self.usage_this_month = 0
        self.month_start_timestamp = self._get_month_start_timestamp()
    
    def _get_month_start_timestamp(self) -> int:
        """Get timestamp for the start of the current month."""
        import datetime
        now = datetime.datetime.now()
        month_start = datetime.datetime(now.year, now.month, 1)
        return int(month_start.timestamp())
    
    def detect_task_type(self, query: str) -> str:
        """Detect the type of task from the query."""
        query_lower = query.lower()
        
        # Check for code-related tasks
        code_indicators = [
            "code", "function", "program", "algorithm", "javascript", 
            "python", "java", "c++", "typescript", "html", "css"
        ]
        if any(indicator in query_lower for indicator in code_indicators):
            return "code"
        
        # Check for math problems
        math_indicators = [
            "calculate", "solve", "equation", "math problem", "compute",
            "derivative", "integral", "algebra", "calculus", "arithmetic"
        ]
        if any(indicator in query_lower for indicator in math_indicators):
            return "math"
        
        # Check for creative tasks
        creative_indicators = [
            "story", "poem", "creative", "imagine", "fiction", "fantasy",
            "character", "novel", "script", "narrative", "write a"
        ]
        if any(indicator in query_lower for indicator in creative_indicators):
            return "creative"
        
        # Check for complex reasoning
        complex_indicators = [
            "analyze", "critique", "evaluate", "compare and contrast",
            "implications", "consequences", "recommend", "strategy",
            "detailed explanation", "comprehensive", "thorough"
        ]
        if any(indicator in query_lower for indicator in complex_indicators):
            return "complex_reasoning"
        
        # Check for summarization
        summary_indicators = [
            "summarize", "summary", "tldr", "briefly explain", "short version",
            "key points", "main ideas"
        ]
        if any(indicator in query_lower for indicator in summary_indicators):
            return "summarization"
        
        # Default to simple reasoning if no specific category is detected
        simple_indicators = [
            "explain", "how", "why", "what", "when", "who", "where",
            "help me understand", "tell me about"
        ]
        if any(indicator in query_lower for indicator in simple_indicators):
            return "simple_reasoning"
        
        # Fallback to general
        return "general"
    
    def get_current_budget_tier(self) -> str:
        """Get the current budget tier based on monthly usage."""
        # Check if we're in a new month
        current_month_start = self._get_month_start_timestamp()
        if current_month_start > self.month_start_timestamp:
            # Reset for new month
            self.month_start_timestamp = current_month_start
            self.usage_this_month = 0
        
        if self.monthly_budget <= 0:
            # No budget constraints
            return "high"
        
        # Calculate remaining budget percentage
        remaining_percentage = 1 - (self.usage_this_month / self.monthly_budget)
        
        # Determine tier
        if remaining_percentage > self.budget_tiers["high"]:
            return "high"
        elif remaining_percentage > self.budget_tiers["medium"]:
            return "medium"
        else:
            return "low"
    
    def record_usage(self, model: str, input_tokens: int, output_tokens: int) -> None:
        """Record token usage for budget tracking."""
        if model not in self.model_costs:
            return
        
        costs = self.model_costs[model]
        input_cost = (input_tokens / 1000) * costs["input"]
        output_cost = (output_tokens / 1000) * costs["output"]
        total_cost = input_cost + output_cost
        
        self.usage_this_month += total_cost
        
        # Log for monitoring
        logger.info(f"Usage recorded: {model}, {input_tokens} input tokens, {output_tokens} output tokens, ${total_cost:.4f}")
    
    def select_optimal_model(self, 
                           query: str, 
                           preferred_provider: Optional[str] = None,
                           force_tier: Optional[str] = None) -> Tuple[str, str]:
        """
        Select the optimal model based on the query and budget constraints.
        Returns a tuple of (provider, model)
        """
        # Detect task type
        task_type = self.detect_task_type(query)
        
        # Get budget tier (unless forced)
        budget_tier = force_tier if force_tier else self.get_current_budget_tier()
        
        # Get the recommended model for this task and budget tier
        recommended_model = self.task_model_mapping[task_type][budget_tier]
        
        # Determine provider based on model
        if recommended_model in ["llama2", "mistral", "codellama"]:
            provider = "ollama"
        else:
            provider = "openai"
        
        # Override provider if specified and compatible
        if preferred_provider:
            if preferred_provider == "ollama" and provider == "openai":
                # Find an Ollama alternative for this task
                for model, capabilities in self.model_capabilities.items():
                    if task_type in capabilities and model in ["llama2", "mistral", "codellama"]:
                        recommended_model = model
                        provider = "ollama"
                        break
            elif preferred_provider == "openai" and provider == "ollama":
                # Find an OpenAI alternative for this task
                for model, capabilities in self.model_capabilities.items():
                    if task_type in capabilities and model not in ["llama2", "mistral", "codellama"]:
                        recommended_model = model
                        provider = "openai"
                        break
        
        logger.info(f"Selected model for task '{task_type}' (tier: {budget_tier}): {provider}:{recommended_model}")
        return provider, recommended_model
    
    def estimate_cost(self, model: str, input_tokens: int, expected_output_tokens: int) -> float:
        """Estimate the cost of a request."""
        if model not in self.model_costs:
            return 0.0
        
        costs = self.model_costs[model]
        input_cost = (input_tokens / 1000) * costs["input"]
        output_cost = (expected_output_tokens / 1000) * costs["output"]
        
        return input_cost + output_cost
</code></pre></div></pre>
<h3 id="3-local-model-prioritization-for-development">3. Local Model Prioritization for Development</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/dev_mode_service.py
import logging
import os
from typing import Dict, List, Any, Optional
import re

logger = logging.getLogger(__name__)

class DevModeService:
    """
    Service that prioritizes local models during development to reduce costs.
    """
    
    def __init__(self):
        # Read environment to determine if we're in development mode
        self.is_dev_mode = os.environ.get("APP_ENV", "development").lower() == "development"
        self.dev_mode_forced = os.environ.get("FORCE_DEV_MODE", "false").lower() == "true"
        
        # Set up developer-focused settings
        self.allow_openai_for_patterns = [
            r"(complex|sophisticated|advanced)\s+(reasoning|analysis)",
            r"(gpt-4|gpt-3\.5|openai)"  # Explicit requests for OpenAI models
        ]
        
        self.use_ollama_for_patterns = [
            r"^test\s",  # Queries starting with "test"
            r"^debug\s",  # Debugging queries
            r"^hello\s",  # Simple greetings
            r"^hi\s",
            r"^try\s"
        ]
        
        # Track usage for reporting
        self.openai_requests = 0
        self.ollama_requests = 0
        self.redirected_requests = 0
    
    def is_development_environment(self) -> bool:
        """Check if we're running in a development environment."""
        return self.is_dev_mode or self.dev_mode_forced
    
    def should_use_local_model(self, query: str) -> bool:
        """
        Determine if a query should use local models in development mode.
        In development, we default to local models unless specific patterns are matched.
        """
        if not self.is_development_environment():
            return False
        
        # Always use local models for specific patterns
        for pattern in self.use_ollama_for_patterns:
            if re.search(pattern, query, re.IGNORECASE):
                return True
        
        # Allow OpenAI for specific advanced patterns
        for pattern in self.allow_openai_for_patterns:
            if re.search(pattern, query, re.IGNORECASE):
                return False
        
        # In development, default to local models to save costs
        return True
    
    def get_dev_routing_decision(self, query: str, default_provider: str) -> str:
        """
        Make a routing decision based on development mode settings.
        Returns: "openai" or "ollama"
        """
        if not self.is_development_environment():
            return default_provider
        
        should_use_local = self.should_use_local_model(query)
        
        # Track for reporting
        if should_use_local:
            self.ollama_requests += 1
            if default_provider == "openai":
                self.redirected_requests += 1
        else:
            self.openai_requests += 1
        
        return "ollama" if should_use_local else "openai"
    
    def get_usage_report(self) -> Dict[str, Any]:
        """Get a report of usage patterns for monitoring costs."""
        total_requests = self.openai_requests + self.ollama_requests
        
        if total_requests == 0:
            ollama_percentage = 0
            redirected_percentage = 0
        else:
            ollama_percentage = (self.ollama_requests / total_requests) * 100
            redirected_percentage = (self.redirected_requests / total_requests) * 100
        
        return {
            "dev_mode_active": self.is_development_environment(),
            "total_requests": total_requests,
            "openai_requests": self.openai_requests,
            "ollama_requests": self.ollama_requests,
            "redirected_to_ollama": self.redirected_requests,
            "ollama_usage_percentage": ollama_percentage,
            "cost_savings_percentage": redirected_percentage
        }
</code></pre></div></pre>
<h3 id="4-request-batching-and-rate-limiting">4. Request Batching and Rate Limiting</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/rate_limiter.py
import time
import asyncio
import logging
from typing import Dict, List, Any, Optional, Callable, Awaitable
from collections import defaultdict
import redis.asyncio as redis

from app.config import settings

logger = logging.getLogger(__name__)

class RateLimiter:
    """
    Rate limiter to control API usage and costs.
    Implements tiered rate limiting based on user roles.
    """
    
    def __init__(self):
        self.redis = None
        
        # Rate limit tiers (requests per time window)
        self.rate_limit_tiers = {
            "free": {
                "minute": 5,
                "hour": 20,
                "day": 100
            },
            "basic": {
                "minute": 20,
                "hour": 100,
                "day": 1000
            },
            "premium": {
                "minute": 60,
                "hour": 1000,
                "day": 10000
            },
            "enterprise": {
                "minute": 120,
                "hour": 5000,
                "day": 50000
            }
        }
        
        # Provider-specific rate limits (global)
        self.provider_rate_limits = {
            "openai": {
                "minute": 60,  # Shared across all users
                "tokens_per_minute": 90000  # Token budget per minute
            },
            "ollama": {
                "minute": 100,  # Higher for local models
                "tokens_per_minute": 250000
            }
        }
        
        # Tracking for available token budgets
        self.token_budgets = {
            "openai": self.provider_rate_limits["openai"]["tokens_per_minute"],
            "ollama": self.provider_rate_limits["ollama"]["tokens_per_minute"]
        }
        self.last_budget_reset = time.time()
    
    async def initialize(self):
        """Initialize Redis connection."""
        self.redis = await redis.from_url(settings.REDIS_URL)
        
        # Start token budget replenishment task
        asyncio.create_task(self._token_budget_replenishment())
    
    async def _token_budget_replenishment(self):
        """Periodically replenish token budgets."""
        while True:
            try:
                now = time.time()
                elapsed = now - self.last_budget_reset
                
                # Reset every minute
                if elapsed >= 60:
                    self.token_budgets = {
                        "openai": self.provider_rate_limits["openai"]["tokens_per_minute"],
                        "ollama": self.provider_rate_limits["ollama"]["tokens_per_minute"]
                    }
                    self.last_budget_reset = now
                
                # Partial replenishment for less than a minute
                else:
                    # Calculate replenishment based on elapsed time
                    openai_replenishment = int((elapsed / 60) * self.provider_rate_limits["openai"]["tokens_per_minute"])
                    ollama_replenishment = int((elapsed / 60) * self.provider_rate_limits["ollama"]["tokens_per_minute"])
                    
                    # Replenish up to max
                    self.token_budgets["openai"] = min(
                        self.token_budgets["openai"] + openai_replenishment,
                        self.provider_rate_limits["openai"]["tokens_per_minute"]
                    )
                    self.token_budgets["ollama"] = min(
                        self.token_budgets["ollama"] + ollama_replenishment,
                        self.provider_rate_limits["ollama"]["tokens_per_minute"]
                    )
                    
                    self.last_budget_reset = now
            except Exception as e:
                logger.error(f"Error in token budget replenishment: {str(e)}")
            
            # Update every 5 seconds
            await asyncio.sleep(5)
    
    async def check_rate_limit(self, 
                             user_id: str, 
                             tier: str = "free",
                             provider: str = "openai") -> Dict[str, Any]:
        """
        Check if a request is within rate limits.
        Returns: {"allowed": bool, "retry_after": Optional[int], "reason": Optional[str]}
        """
        if not self.redis:
            # If Redis is not available, allow the request but log a warning
            logger.warning("Redis not available for rate limiting")
            return {"allowed": True}
        
        # Get rate limits for this user's tier
        tier_limits = self.rate_limit_tiers.get(tier, self.rate_limit_tiers["free"])
        
        # Check user-specific rate limits
        for window, limit in tier_limits.items():
            key = f"rate:user:{user_id}:{window}"
            
            # Get current count
            count = await self.redis.get(key)
            count = int(count) if count else 0
            
            if count >= limit:
                ttl = await self.redis.ttl(key)
                return {
                    "allowed": False,
                    "retry_after": max(1, ttl),
                    "reason": f"Rate limit exceeded for {window}"
                }
        
        # Check provider-specific rate limits
        provider_limits = self.provider_rate_limits.get(provider, {})
        if "minute" in provider_limits:
            provider_key = f"rate:provider:{provider}:minute"
            provider_count = await self.redis.get(provider_key)
            provider_count = int(provider_count) if provider_count else 0
            
            if provider_count >= provider_limits["minute"]:
                ttl = await self.redis.ttl(provider_key)
                return {
                    "allowed": False,
                    "retry_after": max(1, ttl),
                    "reason": f"Global {provider} rate limit exceeded"
                }
        
        # Check token budget
        if provider in self.token_budgets and self.token_budgets[provider] <= 0:
            # Calculate time until next budget refresh
            time_since_reset = time.time() - self.last_budget_reset
            time_until_refresh = max(1, int(60 - time_since_reset))
            
            return {
                "allowed": False,
                "retry_after": time_until_refresh,
                "reason": f"{provider} token budget exhausted"
            }
        
        # All checks passed
        return {"allowed": True}
    
    async def increment_counters(self, 
                               user_id: str, 
                               provider: str, 
                               token_count: int = 0) -> None:
        """Increment rate limit counters after a successful request."""
        if not self.redis:
            return
        
        now = int(time.time())
        
        # Increment user counters for different windows
        pipeline = self.redis.pipeline()
        
        # Minute window (expires in 60 seconds)
        minute_key = f"rate:user:{user_id}:minute"
        pipeline.incr(minute_key)
        pipeline.expireat(minute_key, now + 60)
        
        # Hour window (expires in 3600 seconds)
        hour_key = f"rate:user:{user_id}:hour"
        pipeline.incr(hour_key)
        pipeline.expireat(hour_key, now + 3600)
        
        # Day window (expires in 86400 seconds)
        day_key = f"rate:user:{user_id}:day"
        pipeline.incr(day_key)
        pipeline.expireat(day_key, now + 86400)
        
        # Increment provider counter
        provider_key = f"rate:provider:{provider}:minute"
        pipeline.incr(provider_key)
        pipeline.expireat(provider_key, now + 60)
        
        # Execute all commands
        await pipeline.execute()
        
        # Decrement token budget
        if provider in self.token_budgets and token_count > 0:
            self.token_budgets[provider] = max(0, self.token_budgets[provider] - token_count)
    
    async def get_user_usage(self, user_id: str) -> Dict[str, Any]:
        """Get current usage statistics for a user."""
        if not self.redis:
            return {
                "minute": 0,
                "hour": 0,
                "day": 0
            }
        
        pipeline = self.redis.pipeline()
        
        # Get counts for all windows
        pipeline.get(f"rate:user:{user_id}:minute")
        pipeline.get(f"rate:user:{user_id}:hour")
        pipeline.get(f"rate:user:{user_id}:day")
        
        # Get TTLs (time remaining)
        pipeline.ttl(f"rate:user:{user_id}:minute")
        pipeline.ttl(f"rate:user:{user_id}:hour")
        pipeline.ttl(f"rate:user:{user_id}:day")
        
        results = await pipeline.execute()
        
        return {
            "minute": {
                "usage": int(results[0]) if results[0] else 0,
                "reset_in": results[3] if results[3] and results[3] > 0 else 60
            },
            "hour": {
                "usage": int(results[1]) if results[1] else 0,
                "reset_in": results[4] if results[4] and results[4] > 0 else 3600
            },
            "day": {
                "usage": int(results[2]) if results[2] else 0,
                "reset_in": results[5] if results[5] and results[5] > 0 else 86400
            }
        }
</code></pre></div></pre>
<h3 id="5-memory-and-context-compression">5. Memory and Context Compression</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/context_compression.py
import logging
from typing import List, Dict, Any, Optional
import re
import json

logger = logging.getLogger(__name__)

class ContextCompressor:
    """
    Compresses conversation history to reduce token usage while preserving context.
    """
    
    def __init__(self):
        self.max_summary_tokens = 300  # Target size for summaries
    
    async def compress_history(self, 
                             messages: List[Dict[str, str]],
                             provider_service: Any) -> List[Dict[str, str]]:
        """
        Compress conversation history by summarizing older exchanges.
        Returns a new message list with compressed history.
        """
        # If fewer than 4 messages (system + maybe 1-2 exchanges), no compression needed
        if len(messages) < 4:
            return messages.copy()
        
        # Extract system message
        system_messages = [m for m in messages if m.get("role") == "system"]
        
        # Find the cut point - we'll preserve the most recent exchanges
        if len(messages) <= 10:
            # For shorter conversations, keep the most recent 3 messages (1-2 exchanges)
            preserve_count = 3
            compress_messages = messages[:-preserve_count]
            preserve_messages = messages[-preserve_count:]
        else:
            # For longer conversations, preserve the most recent 4-6 messages (2-3 exchanges)
            preserve_count = min(6, max(4, len(messages) // 5))
            compress_messages = messages[:-preserve_count]
            preserve_messages = messages[-preserve_count:]
        
        # No system message in the compression list
        compress_messages = [m for m in compress_messages if m.get("role") != "system"]
        
        # If nothing to compress, return original
        if not compress_messages:
            return messages.copy()
        
        # Generate summary of the earlier conversation
        summary = await self._generate_conversation_summary(compress_messages, provider_service)
        
        # Create a new message list with the summary + preserved messages
        result = system_messages.copy()  # Start with system message(s)
        
        # Add summary as a system message
        if summary:
            result.append({
                "role": "system",
                "content": f"Previous conversation summary: {summary}"
            })
        
        # Add preserved recent messages
        result.extend(preserve_messages)
        
        return result
    
    async def _generate_conversation_summary(self, 
                                          messages: List[Dict[str, str]], 
                                          provider_service: Any) -> str:
        """Generate a summary of the conversation history."""
        if not messages:
            return ""
        
        # Format the conversation for summarization
        conversation_text = "\n".join([
            f"{m.get('role', 'unknown')}: {m.get('content', '')}" 
            for m in messages if m.get('content')
        ])
        
        # Prepare the summarization prompt
        summary_prompt = [
            {"role": "system", "content": 
                "You are a conversation summarizer. Create a concise summary of the key points "
                "from the conversation that would help maintain context for future responses. "
                "Focus on important information, user preferences, and outstanding questions. "
                "Keep the summary under 200 words."
            },
            {"role": "user", "content": f"Summarize this conversation:\n\n{conversation_text}"}
        ]
        
        # Get a summary using a smaller/faster model
        try:
            summary_response = await provider_service.generate_completion(
                messages=summary_prompt,
                provider="openai",  # Use OpenAI for reliability
                model="gpt-3.5-turbo",  # Use a smaller model for efficiency
                max_tokens=self.max_summary_tokens
            )
            
            if summary_response and summary_response.get("message", {}).get("content"):
                return summary_response["message"]["content"]
            
        except Exception as e:
            logger.error(f"Error generating conversation summary: {str(e)}")
            
            # Simple fallback summary generation
            topics = self._extract_topics(conversation_text)
            if topics:
                return f"Previous conversation covered: {', '.join(topics)}."
        
        return "The conversation covered various topics which have been summarized to save space."
    
    def _extract_topics(self, conversation_text: str) -> List[str]:
        """Simple topic extraction as a fallback mechanism."""
        # Extract potential topic indicators
        topic_phrases = [
            "discussed", "talked about", "mentioned", "referred to",
            "asked about", "inquired about", "wanted to know"
        ]
        
        topics = []
        
        for phrase in topic_phrases:
            pattern = rf"{phrase} ([^\.,:;]+)"
            matches = re.findall(pattern, conversation_text, re.IGNORECASE)
            topics.extend(matches)
        
        # Deduplicate and limit
        unique_topics = list(set(topics))
        return unique_topics[:5]  # Return at most 5 topics
    
    async def compress_user_query(self,
                               original_query: str,
                               provider_service: Any) -> str:
        """
        Compress a long user query to reduce token usage while preserving intent.
        Used for very long inputs.
        """
        # If query is already reasonably sized, return as is
        if len(original_query.split()) < 100:
            return original_query
            
        # Prepare compression prompt
        compression_prompt = [
            {"role": "system", "content": 
                "You are a query optimizer. Your job is to reformulate user queries to be more "
                "concise while preserving the core intent and all critical details. "
                "Remove redundant information and excessive elaboration, but maintain all "
                "specific requirements, constraints, and examples provided."
            },
            {"role": "user", "content": f"Optimize this query to be more concise while preserving all important details:\n\n{original_query}"}
        ]
        
        # Get a compressed query
        try:
            compression_response = await provider_service.generate_completion(
                messages=compression_prompt,
                provider="openai",
                model="gpt-3.5-turbo",
                max_tokens=len(original_query.split()) // 2  # Target ~50% reduction
            )
            
            if (compression_response and 
                compression_response.get("message", {}).get("content") and
                len(compression_response["message"]["content"]) < len(original_query)):
                return compression_response["message"]["content"]
                
        except Exception as e:
            logger.error(f"Error compressing user query: {str(e)}")
        
        # If compression fails or doesn't reduce size, return original
        return original_query
</code></pre></div></pre>
<h2 id="response-accuracy-optimization-strategies">Response Accuracy Optimization Strategies</h2>
<h3 id="1-prompt-engineering-templates">1. Prompt Engineering Templates</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/prompt_templates.py
from typing import Dict, List, Any, Optional
import re

class PromptTemplates:
    """
    Provides optimized prompt templates for different use cases to improve response accuracy.
    """
    
    def __init__(self):
        # Core system prompt templates
        self.system_templates = {
            "general": """
                You are a helpful assistant with diverse knowledge and capabilities.
                Provide accurate, relevant, and concise responses to user queries.
                When you don't know something, admit it rather than making up information.
                Format your responses clearly using markdown when helpful.
            """,
            
            "coding": """
                You are a coding assistant with expertise in programming languages and software development.
                Provide correct, efficient, and well-documented code examples.
                Explain your code clearly and highlight important concepts.
                Format code blocks using markdown with appropriate syntax highlighting.
                Suggest best practices and consider edge cases in your solutions.
            """,
            
            "research": """
                You are a research assistant with access to broad knowledge.
                Provide comprehensive, accurate, and nuanced information.
                Consider different perspectives and cite limitations of your knowledge.
                Structure complex information clearly and logically.
                Indicate uncertainty when appropriate rather than speculating.
            """,
            
            "math": """
                You are a mathematics tutor with expertise in various mathematical domains.
                Provide step-by-step explanations for mathematical problems.
                Use clear notation and formatting for equations using markdown.
                Verify your solutions and check for errors or edge cases.
                When solving problems, explain the underlying concepts and techniques.
            """,
            
            "creative": """
                You are a creative assistant skilled in writing, storytelling, and idea generation.
                Provide original, engaging, and imaginative content based on user requests.
                Consider tone, style, and audience in your creative work.
                When generating stories or content, maintain internal consistency.
                Respect copyright and avoid plagiarizing existing creative works.
            """
        }
        
        # Task-specific prompt templates that can be inserted into system prompts
        self.task_templates = {
            "step_by_step": """
                Break down your explanation into clear, logical steps.
                Begin with foundational concepts before advancing to more complex ideas.
                Use numbered or bulleted lists for sequential instructions or key points.
                Provide examples to illustrate abstract concepts.
            """,
            
            "comparison": """
                Present a balanced and objective comparison.
                Identify clear categories for comparison (features, performance, use cases, etc.).
                Highlight both similarities and differences.
                Consider context and specific use cases in your evaluation.
                Avoid unjustified bias and present evidence for evaluative statements.
            """,
            
            "factual_accuracy": """
                Prioritize accuracy over comprehensiveness.
                Clearly distinguish between well-established facts, expert consensus, and speculation.
                Acknowledge limitations in your knowledge, especially for time-sensitive information.
                Avoid overgeneralizations and recognize exceptions where relevant.
            """,
            
            "technical_explanation": """
                Begin with a high-level overview before diving into technical details.
                Define specialized terminology when introduced.
                Use analogies to explain complex concepts when appropriate.
                Balance technical precision with accessibility based on the apparent expertise level of the user.
            """
        }
        
        # Output format templates
        self.format_templates = {
            "pros_cons": """
                Structure your response with clearly labeled sections for advantages and disadvantages.
                Use bullet points or numbered lists for each point.
                Consider different perspectives or use cases.
                If applicable, provide a balanced conclusion or recommendation.
            """,
            
            "academic": """
                Structure your response similar to an academic paper with introduction, body, and conclusion.
                Use formal language and precise terminology.
                Acknowledge limitations and alternative viewpoints.
                Refer to theoretical frameworks or methodologies where relevant.
            """,
            
            "tutorial": """
                Structure your response as a tutorial with clear sections:
                - Introduction explaining what will be covered and prerequisites
                - Step-by-step instructions with examples
                - Common pitfalls or troubleshooting tips
                - Summary of key takeaways
                Use headings and code blocks with appropriate formatting.
            """,
            
            "eli5": """
                Explain the concept as if to a 10-year-old with no specialized knowledge.
                Use simple language and concrete analogies.
                Break complex ideas into simple components.
                Avoid jargon, or define terms very clearly when they must be used.
            """
        }
    
    def get_system_prompt(self, category: str, include_tasks: List[str] = None) -> str:
        """Get a system prompt template with optional task-specific additions."""
        base_template = self.system_templates.get(
            category, 
            self.system_templates["general"]
        ).strip()
        
        if not include_tasks:
            return base_template
        
        # Add selected task templates
        task_additions = []
        for task in include_tasks:
            if task in self.task_templates:
                task_additions.append(self.task_templates[task].strip())
        
        if task_additions:
            combined = base_template + "\n\n" + "\n\n".join(task_additions)
            return combined
        
        return base_template
    
    def enhance_user_prompt(self, original_prompt: str, format_type: str = None) -> str:
        """Enhance a user prompt with formatting instructions."""
        if not format_type or format_type not in self.format_templates:
            return original_prompt
        
        format_instructions = self.format_templates[format_type].strip()
        enhanced_prompt = f"{original_prompt}\n\nPlease format your response as follows:\n{format_instructions}"
        
        return enhanced_prompt
    
    def detect_format_type(self, prompt: str) -> Optional[str]:
        """Detect what format type might be appropriate based on prompt content."""
        prompt_lower = prompt.lower()
        
        # Check for format indicators
        if any(phrase in prompt_lower for phrase in ["pros and cons", "advantages and disadvantages", "benefits and drawbacks"]):
            return "pros_cons"
        
        if any(phrase in prompt_lower for phrase in ["academic", "paper", "research", "literature", "theoretical"]):
            return "academic"
        
        if any(phrase in prompt_lower for phrase in ["tutorial", "how to", "guide", "step by step", "walkthrough"]):
            return "tutorial"
        
        if any(phrase in prompt_lower for phrase in ["explain like", "eli5", "simple terms", "layman's terms", "simply explain"]):
            return "eli5"
        
        return None
</code></pre></div></pre>
<h3 id="2-context-aware-chain-of-thought">2. Context-Aware Chain of Thought</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/chain_of_thought.py
from typing import Dict, List, Any, Optional
import logging
import json
import re

logger = logging.getLogger(__name__)

class ChainOfThoughtService:
    """
    Enhances response accuracy by enabling step-by-step reasoning.
    """
    
    def __init__(self):
        # Configure when to use chain-of-thought prompting
        self.cot_triggers = [
            # Keywords indicating complex reasoning is needed
            r"(why|how|explain|analyze|reason|think|consider)",
            # Question patterns that benefit from step-by-step thinking
            r"(what (would|will|could|might) happen if)",
            r"(what (is|are) the (cause|reason|impact|effect|implication))",
            # Complexity indicators
            r"(complex|complicated|difficult|challenging|nuanced)",
            # Multi-step problems
            r"(steps|process|procedure|method|approach)"
        ]
        
        # Task-specific CoT templates
        self.cot_templates = {
            "general": "Let's think through this step-by-step.",
            
            "math": """
                Let's solve this step-by-step:
                1. First, understand what we're looking for
                2. Identify the relevant information and equations
                3. Work through the solution methodically
                4. Verify the answer makes sense
            """,
            
            "reasoning": """
                Let's approach this systematically:
                1. Identify the key elements of the problem
                2. Consider relevant principles and constraints
                3. Analyze potential approaches
                4. Evaluate and compare alternatives
                5. Draw a well-reasoned conclusion
            """,
            
            "decision": """
                Let's analyze this decision carefully:
                1. Clarify the decision to be made
                2. Identify the key criteria and constraints
                3. Consider the available options
                4. Evaluate each option against the criteria
                5. Assess potential risks and trade-offs
                6. Recommend the best course of action with justification
            """,
            
            "causal": """
                Let's analyze the causal relationships:
                1. Identify the events or phenomena to be explained
                2. Consider potential causes and mechanisms
                3. Evaluate the evidence for each causal link
                4. Consider alternative explanations
                5. Draw conclusions about the most likely causal relationships
            """
        }
        
        # Internal vs. external CoT modes
        self.cot_modes = {
            "internal": {
                "prefix": "Think through this problem step-by-step before providing your final answer.",
                "format": "standard"  # No special formatting needed
            },
            "external": {
                "prefix": "Show your step-by-step reasoning process explicitly in your response.",
                "format": "markdown"  # Format as markdown
            }
        }
    
    def should_use_cot(self, query: str) -> bool:
        """Determine if chain-of-thought prompting should be used for this query."""
        query_lower = query.lower()
        
        # Check for CoT triggers
        for pattern in self.cot_triggers:
            if re.search(pattern, query_lower):
                return True
        
        # Check for task complexity indicators
        if len(query.split()) > 30:  # Longer queries often benefit from CoT
            return True
            
        # Check for explicit reasoning requests
        explicit_requests = [
            "step by step", "explain your reasoning", "think through", 
            "show your work", "explain how you", "walk me through"
        ]
        
        if any(request in query_lower for request in explicit_requests):
            return True
        
        return False
    
    def detect_task_type(self, query: str) -> str:
        """Detect the type of reasoning task from the query."""
        query_lower = query.lower()
        
        # Check for mathematical content
        math_indicators = [
            "calculate", "compute", "solve", "equation", "formula",
            "find the value", "what is the result", r"\d+(\.\d+)?"
        ]
        
        if any(re.search(indicator, query_lower) for indicator in math_indicators):
            return "math"
        
        # Check for decision-making queries
        decision_indicators = [
            "should i", "which is better", "what's the best", "recommend", 
            "decide between", "choose", "options"
        ]
        
        if any(indicator in query_lower for indicator in decision_indicators):
            return "decision"
        
        # Check for causal analysis
        causal_indicators = [
            "why did", "what caused", "reason for", "explain why",
            "how does", "what leads to", "effect of", "impact of"
        ]
        
        if any(indicator in query_lower for indicator in causal_indicators):
            return "causal"
        
        # Default to general reasoning
        reasoning_indicators = [
            "explain", "analyze", "evaluate", "critique", "assess",
            "compare", "contrast", "discuss", "review"
        ]
        
        if any(indicator in query_lower for indicator in reasoning_indicators):
            return "reasoning"
        
        return "general"
    
    def enhance_prompt_with_cot(self, 
                              query: str, 
                              mode: str = "internal",
                              explicit_template: bool = False) -> str:
        """
        Enhance a prompt with chain-of-thought instructions.
        
        Args:
            query: The original user query
            mode: "internal" (for model thinking) or "external" (for visible reasoning)
            explicit_template: Whether to include the full template or just the instruction
        """
        if not self.should_use_cot(query):
            return query
        
        # Get CoT mode configuration
        cot_mode = self.cot_modes.get(mode, self.cot_modes["internal"])
        
        # Detect the task type
        task_type = self.detect_task_type(query)
        
        # Get the appropriate template
        template = self.cot_templates.get(task_type, self.cot_templates["general"])
        
        if explicit_template:
            # Add the full template
            enhanced = f"{query}\n\n{cot_mode['prefix']}\n\n{template.strip()}"
        else:
            # Just add the basic instruction
            enhanced = f"{query}\n\n{cot_mode['prefix']}"
        
        return enhanced
    
    def format_cot_for_response(self, reasoning: str, final_answer: str, mode: str = "external") -> str:
        """
        Format chain-of-thought reasoning and final answer for response.
        
        Args:
            reasoning: The step-by-step reasoning process
            final_answer: The final answer or conclusion
            mode: "internal" (hidden) or "external" (visible)
        """
        if mode == "internal":
            # For internal mode, just return the final answer
            return final_answer
        
        # For external mode, format the reasoning and answer
        formatted = f"""
## Reasoning Process

{reasoning}

## Conclusion

{final_answer}
"""
        return formatted.strip()
</code></pre></div></pre>
<h3 id="3-self-verification-and-error-correction">3. Self-Verification and Error Correction</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/verification_service.py
import logging
from typing import Dict, List, Any, Optional, Tuple
import re
import json

logger = logging.getLogger(__name__)

class VerificationService:
    """
    Improves response accuracy through self-verification and error correction.
    """
    
    def __init__(self):
        # Define verification categories
        self.verification_categories = [
            "factual_accuracy",
            "logical_consistency",
            "completeness",
            "code_correctness",
            "calculation_accuracy",
            "bias_detection"
        ]
        
        # High-risk categories that should always be verified
        self.high_risk_categories = [
            "medical",
            "legal",
            "financial",
            "security"
        ]
        
        # Verification prompt templates
        self.verification_templates = {
            "general": """
                Please verify your response for:
                1. Factual accuracy - Are all stated facts correct?
                2. Logical consistency - Is the reasoning sound and free of contradictions?
                3. Completeness - Does the answer address all aspects of the question?
                4. Clarity - Is the response clear and easy to understand?
                
                If you find any errors or omissions, please correct them in your response.
            """,
            
            "factual": """
                Critically verify the factual claims in your response:
                - Are dates, names, and definitions accurate?
                - Are statistics and measurements correct?
                - Are attributions to people, organizations, or sources accurate?
                - Have you distinguished between facts and opinions/interpretations?
                
                If you identify any factual errors, please correct them.
            """,
            
            "code": """
                Verify your code for:
                1. Syntax errors and typos
                2. Logical correctness - does it perform the intended function?
                3. Edge cases and error handling
                4. Efficiency and best practices
                5. Security vulnerabilities
                
                If you find any issues, please provide corrected code.
            """,
            
            "math": """
                Verify your mathematical work by:
                1. Re-checking each calculation step
                2. Verifying that formulas are applied correctly
                3. Confirming unit conversions if applicable
                4. Testing the solution with sample values if possible
                5. Checking for arithmetic errors
                
                If you find any errors, please recalculate and provide the correct answer.
            """,
            
            "bias": """
                Check your response for potential biases:
                1. Is the framing balanced and objective?
                2. Have you considered diverse perspectives?
                3. Are there cultural, geographic, or demographic assumptions?
                4. Does the language contain implicit value judgments?
                
                If you detect bias, please revise for greater objectivity.
            """
        }
    
    def detect_verification_needs(self, query: str) -> List[str]:
        """Detect which verification categories are needed based on the query."""
        query_lower = query.lower()
        needed_categories = []
        
        # Check for high-risk topics
        high_risk_detected = False
        for category in self.high_risk_categories:
            if category in query_lower or f"related to {category}" in query_lower:
                high_risk_detected = True
                break
        
        # For high-risk topics, perform comprehensive verification
        if high_risk_detected:
            return ["factual_accuracy", "logical_consistency", "completeness", "bias_detection"]
        
        # Check for code-related content
        code_indicators = ["code", "function", "program", "algorithm", "syntax"]
        if any(indicator in query_lower for indicator in code_indicators):
            needed_categories.append("code_correctness")
        
        # Check for mathematical content
        math_indicators = ["calculate", "compute", "solve", "equation", "math problem"]
        if any(indicator in query_lower for indicator in math_indicators):
            needed_categories.append("calculation_accuracy")
        
        # Check for factual questions
        factual_indicators = ["fact", "information about", "when did", "who is", "history of"]
        if any(indicator in query_lower for indicator in factual_indicators):
            needed_categories.append("factual_accuracy")
        
        # Check for logical reasoning requirements
        logic_indicators = ["why", "explain", "reason", "because", "therefore", "hence"]
        if any(indicator in query_lower for indicator in logic_indicators):
            needed_categories.append("logical_consistency")
        
        # For comprehensive questions
        if len(query.split()) > 30 or "comprehensive" in query_lower or "detailed" in query_lower:
            needed_categories.append("completeness")
        
        # For sensitive or controversial topics
        sensitive_indicators = ["controversy", "debate", "opinion", "perspective", "ethical"]
        if any(indicator in query_lower for indicator in sensitive_indicators):
            needed_categories.append("bias_detection")
        
        # Default to basic verification if nothing specific detected
        if not needed_categories:
            needed_categories = ["factual_accuracy", "logical_consistency"]
        
        return needed_categories
    
    def get_verification_prompt(self, categories: List[str]) -> str:
        """Get the appropriate verification prompt based on needed categories."""
        if "code_correctness" in categories and len(categories) == 1:
            return self.verification_templates["code"]
            
        if "calculation_accuracy" in categories and len(categories) == 1:
            return self.verification_templates["math"]
            
        if "factual_accuracy" in categories and "bias_detection" not in categories:
            return self.verification_templates["factual"]
            
        if "bias_detection" in categories and len(categories) == 1:
            return self.verification_templates["bias"]
            
        # Default to general verification
        return self.verification_templates["general"]
    
    async def verify_response(self, 
                            query: str, 
                            initial_response: str,
                            provider_service: Any) -> Tuple[str, bool]:
        """
        Verify and potentially correct a response.
        
        Returns:
            Tuple of (verified_response, was_corrected)
        """
        # Detect verification needs
        verification_categories = self.detect_verification_needs(query)
        
        # If no verification needed, return original
        if not verification_categories:
            return initial_response, False
            
        # Get verification prompt
        verification_prompt = self.get_verification_prompt(verification_categories)
        
        # Create verification messages
        verification_messages = [
            {"role": "system", "content": 
                "You are a verification assistant. Your job is to verify the accuracy, "
                "consistency, and completeness of responses. Identify any errors or "
                "issues, and provide corrections when necessary."
            },
            {"role": "user", "content": query},
            {"role": "assistant", "content": initial_response},
            {"role": "user", "content": verification_prompt}
        ]
        
        try:
            verification_response = await provider_service.generate_completion(
                messages=verification_messages,
                provider="openai",  # Use OpenAI for verification
                model="gpt-4"  # Use a more capable model for verification
            )
            
            if verification_response and verification_response.get("message", {}).get("content"):
                # Check if verification found issues
                verification_text = verification_response["message"]["content"]
                
                # Look for indicators of corrections
                correction_indicators = [
                    "correction", "error", "mistake", "incorrect", 
                    "needs clarification", "inaccurate", "not quite right"
                ]
                
                if any(indicator in verification_text.lower() for indicator in correction_indicators):
                    # Attempt to correct the response
                    corrected_response = await self._generate_corrected_response(
                        query, initial_response, verification_text, provider_service
                    )
                    return corrected_response, True
                
                # If verification found no issues, or was just minor clarifications
                minor_indicators = ["minor clarification", "additional note", "small correction"]
                if any(indicator in verification_text.lower() for indicator in minor_indicators):
                    # Include the clarification in the response
                    combined = f"{initial_response}\n\n**Note:** {verification_text}"
                    return combined, True
            
            # If verification failed or found no issues
            return initial_response, False
                
        except Exception as e:
            logger.error(f"Error in response verification: {str(e)}")
            return initial_response, False
    
    async def _generate_corrected_response(self,
                                        query: str,
                                        initial_response: str,
                                        verification_text: str,
                                        provider_service: Any) -> str:
        """Generate a corrected response based on verification feedback."""
        correction_prompt = [
            {"role": "system", "content": 
                "You are a correction assistant. Your job is to provide a revised response "
                "that addresses the issues identified in the verification feedback. "
                "Create a complete, standalone corrected response."
            },
            {"role": "user", "content": f"Original question:\n{query}"},
            {"role": "assistant", "content": f"Initial response:\n{initial_response}"},
            {"role": "user", "content": f"Verification feedback:\n{verification_text}\n\nPlease provide a corrected response."}
        ]
        
        try:
            correction_response = await provider_service.generate_completion(
                messages=correction_prompt,
                provider="openai",
                model="gpt-4"
            )
            
            if correction_response and correction_response.get("message", {}).get("content"):
                return correction_response["message"]["content"]
                
        except Exception as e:
            logger.error(f"Error generating corrected response: {str(e)}")
        
        # Fallback - append verification notes to original
        return f"{initial_response}\n\n**Correction Note:** {verification_text}"
</code></pre></div></pre>
<h3 id="4-domain-specific-knowledge-integration">4. Domain-Specific Knowledge Integration</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/domain_knowledge.py
import logging
from typing import Dict, List, Any, Optional
import json
import re
import os
import yaml

logger = logging.getLogger(__name__)

class DomainKnowledgeService:
    """
    Enhances response accuracy by integrating domain-specific knowledge.
    """
    
    def __init__(self, knowledge_dir: str = "knowledge"):
        self.knowledge_dir = knowledge_dir
        
        # Domain definitions
        self.domains = {
            "programming": {
                "keywords": ["coding", "programming", "software", "development", "algorithm", "function"],
                "languages": ["python", "javascript", "java", "c++", "ruby", "go", "rust", "php"]
            },
            "medicine": {
                "keywords": ["medical", "health", "disease", "treatment", "diagnosis", "symptom", "patient"],
                "specialties": ["cardiology", "neurology", "pediatrics", "oncology", "psychiatry"]
            },
            "finance": {
                "keywords": ["finance", "investment", "stock", "market", "trading", "portfolio", "asset"],
                "topics": ["stocks", "bonds", "cryptocurrency", "retirement", "taxes", "budgeting"]
            },
            "law": {
                "keywords": ["legal", "law", "regulation", "compliance", "contract", "liability"],
                "areas": ["corporate", "criminal", "civil", "constitutional", "intellectual property"]
            },
            "science": {
                "keywords": ["science", "research", "experiment", "theory", "hypothesis", "evidence"],
                "fields": ["physics", "chemistry", "biology", "astronomy", "geology", "ecology"]
            }
        }
        
        # Load domain knowledge
        self.domain_knowledge = self._load_domain_knowledge()
        
        # Track query->domain mappings to optimize repeated queries
        self.domain_cache = {}
    
    def _load_domain_knowledge(self) -> Dict[str, Any]:
        """Load domain knowledge from files."""
        knowledge = {}
        
        try:
            # Create knowledge dir if it doesn't exist
            os.makedirs(self.knowledge_dir, exist_ok=True)
            
            # List all domain knowledge files
            for domain in self.domains.keys():
                domain_path = os.path.join(self.knowledge_dir, f"{domain}.yaml")
                
                # Create empty file if it doesn't exist
                if not os.path.exists(domain_path):
                    with open(domain_path, 'w') as f:
                        yaml.dump({
                            "domain": domain,
                            "concepts": {},
                            "facts": [],
                            "common_misconceptions": [],
                            "best_practices": []
                        }, f)
                
                # Load domain knowledge
                try:
                    with open(domain_path, 'r') as f:
                        domain_data = yaml.safe_load(f)
                        knowledge[domain] = domain_data
                except Exception as e:
                    logger.error(f"Error loading domain knowledge for {domain}: {str(e)}")
                    knowledge[domain] = {
                        "domain": domain,
                        "concepts": {},
                        "facts": [],
                        "common_misconceptions": [],
                        "best_practices": []
                    }
        except Exception as e:
            logger.error(f"Error loading domain knowledge: {str(e)}")
        
        return knowledge
    
    def detect_domains(self, query: str) -> List[str]:
        """Detect relevant domains for a query."""
        # Check cache first
        cache_key = hashlib.md5(query.encode()).hexdigest()
        if cache_key in self.domain_cache:
            return self.domain_cache[cache_key]
        
        query_lower = query.lower()
        relevant_domains = []
        
        # Check each domain for relevance
        for domain, definition in self.domains.items():
            # Check domain keywords
            keyword_match = any(keyword in query_lower for keyword in definition["keywords"])
            
            # Check specific domain topics
            topic_match = False
            for topic_category, topics in definition.items():
                if topic_category != "keywords":
                    if any(topic in query_lower for topic in topics):
                        topic_match = True
                        break
            
            if keyword_match or topic_match:
                relevant_domains.append(domain)
        
        # Cache result
        self.domain_cache[cache_key] = relevant_domains
        return relevant_domains
    
    def get_domain_knowledge(self, domains: List[str]) -> Dict[str, Any]:
        """Get knowledge for the specified domains."""
        combined_knowledge = {
            "concepts": {},
            "facts": [],
            "common_misconceptions": [],
            "best_practices": []
        }
        
        for domain in domains:
            if domain in self.domain_knowledge:
                domain_data = self.domain_knowledge[domain]
                
                # Merge concepts (dictionary)
                combined_knowledge["concepts"].update(domain_data.get("concepts", {}))
                
                # Extend lists
                for key in ["facts", "common_misconceptions", "best_practices"]:
                    combined_knowledge[key].extend(domain_data.get(key, []))
        
        return combined_knowledge
    
    def format_domain_knowledge(self, knowledge: Dict[str, Any]) -> str:
        """Format domain knowledge as a context string."""
        if not knowledge or all(not v for v in knowledge.values()):
            return ""
        
        formatted_parts = []
        
        # Format concepts
        if knowledge["concepts"]:
            concepts_list = []
            for concept, definition in knowledge["concepts"].items():
                concepts_list.append(f"- {concept}: {definition}")
            
            formatted_parts.append("Key concepts:\n" + "\n".join(concepts_list))
        
        # Format facts
        if knowledge["facts"]:
            formatted_parts.append("Important facts:\n- " + "\n- ".join(knowledge["facts"]))
        
        # Format misconceptions
        if knowledge["common_misconceptions"]:
            formatted_parts.append("Common misconceptions to avoid:\n- " + "\n- ".join(knowledge["common_misconceptions"]))
        
        # Format best practices
        if knowledge["best_practices"]:
            formatted_parts.append("Best practices:\n- " + "\n- ".join(knowledge["best_practices"]))
        
        return "\n\n".join(formatted_parts)
    
    def enhance_prompt_with_domain_knowledge(self, query: str, system_prompt: str) -> str:
        """Enhance a system prompt with relevant domain knowledge."""
        # Detect relevant domains
        domains = self.detect_domains(query)
        
        if not domains:
            return system_prompt
        
        # Get domain knowledge
        knowledge = self.get_domain_knowledge(domains)
        
        # Format knowledge as context
        knowledge_text = self.format_domain_knowledge(knowledge)
        
        if not knowledge_text:
            return system_prompt
        
        # Add to system prompt
        enhanced_prompt = f"{system_prompt}\n\nRelevant domain knowledge:\n{knowledge_text}"
        
        return enhanced_prompt
</code></pre></div></pre>
<h3 id="5-dynamic-few-shot-learning">5. Dynamic Few-Shot Learning</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/few_shot_examples.py
import logging
from typing import Dict, List, Any, Optional, Tuple
import os
import json
import random
import re
import hashlib

logger = logging.getLogger(__name__)

class FewShotExampleService:
    """
    Enhances response accuracy using dynamic few-shot learning with examples.
    """
    
    def __init__(self, examples_dir: str = "examples"):
        self.examples_dir = examples_dir
        
        # Ensure examples directory exists
        os.makedirs(examples_dir, exist_ok=True)
        
        # Task categories for examples
        self.task_categories = {
            "code_generation": {
                "keywords": ["write code", "function", "implement", "program", "algorithm"],
                "patterns": [r"write a .* function", r"implement .* in (python|javascript|java|c\+\+)"]
            },
            "explanation": {
                "keywords": ["explain", "describe", "how does", "what is", "why is"],
                "patterns": [r"explain .* to me", r"what is the .* of", r"how does .* work"]
            },
            "classification": {
                "keywords": ["classify", "categorize", "identify", "is this", "determine"],
                "patterns": [r"is this .* or .*", r"which category", r"identify the .*"]
            },
            "comparison": {
                "keywords": ["compare", "contrast", "difference", "similarities", "versus"],
                "patterns": [r"compare .* and .*", r"what is the difference between", r".* vs .*"]
            },
            "summarization": {
                "keywords": ["summarize", "summary", "brief overview", "key points"],
                "patterns": [r"summarize .*", r"provide a summary", r"key points of"]
            }
        }
        
        # Load examples
        self.examples = self._load_examples()
    
    def _load_examples(self) -> Dict[str, List[Dict[str, str]]]:
        """Load examples from files."""
        examples = {category: [] for category in self.task_categories.keys()}
        
        # Load examples for each category
        for category in self.task_categories.keys():
            category_file = os.path.join(self.examples_dir, f"{category}.json")
            
            if os.path.exists(category_file):
                try:
                    with open(category_file, 'r') as f:
                        category_examples = json.load(f)
                        examples[category] = category_examples
                except Exception as e:
                    logger.error(f"Error loading examples for {category}: {str(e)}")
        
        return examples
    
    def detect_task_category(self, query: str) -> Optional[str]:
        """Detect the task category for a query."""
        query_lower = query.lower()
        
        # Check each category
        for category, definition in self.task_categories.items():
            # Check keywords
            if any(keyword in query_lower for keyword in definition["keywords"]):
                return category
            
            # Check regex patterns
            if any(re.search(pattern, query_lower) for pattern in definition["patterns"]):
                return category
        
        return None
    
    def select_examples(self, 
                      query: str, 
                      category: Optional[str] = None, 
                      num_examples: int = 3) -> List[Dict[str, str]]:
        """Select the most relevant examples for a query."""
        # Detect category if not provided
        if not category:
            category = self.detect_task_category(query)
            
        if not category or category not in self.examples or not self.examples[category]:
            return []
        
        category_examples = self.examples[category]
        
        # If we have few examples, just return all of them (up to num_examples)
        if len(category_examples) <= num_examples:
            return category_examples
        
        # For simplicity, we're using random selection here
        # In a production system, this would use semantic similarity or other relevance metrics
        selected = random.sample(category_examples, min(num_examples, len(category_examples)))
        
        return selected
    
    def format_examples_for_prompt(self, examples: List[Dict[str, str]]) -> str:
        """Format examples for inclusion in a prompt."""
        if not examples:
            return ""
        
        formatted_examples = []
        
        for i, example in enumerate(examples, 1):
            query = example.get("query", "")
            response = example.get("response", "")
            
            formatted = f"Example {i}:\n\nUser: {query}\n\nAssistant: {response}\n"
            formatted_examples.append(formatted)
        
        return "\n".join(formatted_examples)
    
    def enhance_prompt_with_examples(self, 
                                   query: str, 
                                   system_prompt: str,
                                   num_examples: int = 2) -> str:
        """Enhance a system prompt with few-shot examples."""
        # Select relevant examples
        examples = self.select_examples(query, num_examples=num_examples)
        
        if not examples:
            return system_prompt
        
        # Format examples
        examples_text = self.format_examples_for_prompt(examples)
        
        # Add to system prompt
        enhanced_prompt = f"{system_prompt}\n\nHere are some examples of how to respond to similar queries:\n\n{examples_text}"
        
        return enhanced_prompt
    
    def add_example(self, category: str, query: str, response: str) -> bool:
        """Add a new example to the examples collection."""
        if category not in self.task_categories:
            logger.error(f"Invalid category: {category}")
            return False
        
        example = {
            "query": query,
            "response": response,
            "id": hashlib.md5(f"{category}:{query}".encode()).hexdigest()
        }
        
        # Add to in-memory collection
        if category not in self.examples:
            self.examples[category] = []
        
        # Check if this example already exists
        existing_ids = [e.get("id") for e in self.examples[category]]
        if example["id"] in existing_ids:
            return False  # Example already exists
        
        self.examples[category].append(example)
        
        # Save to file
        try:
            category_file = os.path.join(self.examples_dir, f"{category}.json")
            with open(category_file, 'w') as f:
                json.dump(self.examples[category], f, indent=2)
            return True
        except Exception as e:
            logger.error(f"Error saving example: {str(e)}")
            return False
</code></pre></div></pre>
<h2 id="deployment-strategies">Deployment Strategies</h2>
<h3 id="local-development-environment">Local Development Environment</h3>
<h4 id="setup-script-for-local-deployment">Setup Script for Local Deployment</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">#!/bin/bash
# local_setup.sh - Set up local development environment

set -e  # Exit on error

# Check for required tools
echo "Checking prerequisites..."
command -v python3 >/dev/null 2>&1 || { echo "Python 3 is required but not installed. Aborting."; exit 1; }
command -v pip3 >/dev/null 2>&1 || { echo "pip3 is required but not installed. Aborting."; exit 1; }
command -v docker >/dev/null 2>&1 || { echo "Docker is required but not installed. Aborting."; exit 1; }
command -v docker-compose >/dev/null 2>&1 || { echo "Docker Compose is required but not installed. Aborting."; exit 1; }

# Create virtual environment
echo "Creating Python virtual environment..."
python3 -m venv venv
source venv/bin/activate

# Install dependencies
echo "Installing Python dependencies..."
pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Set up environment file
if [ ! -f .env ]; then
    echo "Creating .env file..."
    cp .env.example .env
    
    # Prompt for OpenAI API key
    read -p "Enter your OpenAI API key (leave blank to skip): " openai_key
    if [ ! -z "$openai_key" ]; then
        sed -i "s/OPENAI_API_KEY=.*/OPENAI_API_KEY=$openai_key/" .env
    fi
    
    # Set environment to development
    sed -i "s/APP_ENV=.*/APP_ENV=development/" .env
    
    echo ".env file created. Please review and update as needed."
else
    echo ".env file already exists. Skipping creation."
fi

# Check if Ollama is installed
if ! command -v ollama >/dev/null 2>&1; then
    echo "Ollama not found. Would you like to install it? (y/n)"
    read install_ollama
    
    if [ "$install_ollama" = "y" ]; then
        echo "Installing Ollama..."
        if [[ "$OSTYPE" == "darwin"* ]]; then
            # macOS
            curl -fsSL https://ollama.com/install.sh | sh
        else
            # Linux
            curl -fsSL https://ollama.com/install.sh | sh
        fi
    else
        echo "Skipping Ollama installation. You will need to install it manually."
    fi
else
    echo "Ollama already installed."
fi

# Pull required Ollama models
if command -v ollama >/dev/null 2>&1; then
    echo "Would you like to pull the recommended Ollama models? (y/n)"
    read pull_models
    
    if [ "$pull_models" = "y" ]; then
        echo "Pulling Ollama models..."
        ollama pull llama2
        ollama pull mistral
        ollama pull codellama
    fi
fi

# Start Redis for development
echo "Starting Redis with Docker..."
docker-compose up -d redis

# Initialize database
echo "Initializing database..."
python scripts/init_db.py

# Run tests to verify setup
echo "Running tests to verify setup..."
pytest tests/unit

echo "Setup complete! You can now start the development server with:"
echo "uvicorn app.main:app --reload"
</code></pre></div></pre>
<h4 id="docker-compose-for-local-services">Docker Compose for Local Services</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># docker-compose.yml
version: '3.8'

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile.dev
    ports:
      - "8000:8000"
    volumes:
      - .:/app
    environment:
      - PYTHONPATH=/app
      - REDIS_URL=redis://redis:6379/0
      - OLLAMA_HOST=http://ollama:11434
      - APP_ENV=development
      - FORCE_DEV_MODE=true
    depends_on:
      - redis
      - ollama
    command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

  redis:
    image: redis:alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  ui:
    build:
      context: ./ui
      dockerfile: Dockerfile.dev
    ports:
      - "3000:3000"
    volumes:
      - ./ui:/app
      - /app/node_modules
    environment:
      - API_URL=http://app:8000
    depends_on:
      - app
    command: npm start

volumes:
  redis_data:
  ollama_data:
</code></pre></div></pre>
<h4 id="development-dockerfile">Development Dockerfile</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Dockerfile</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-dockerfile"># Dockerfile.dev
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    gcc \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt requirements-dev.txt ./
RUN pip install --no-cache-dir -r requirements.txt -r requirements-dev.txt

# Copy application code
COPY . .

# Set development environment
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV APP_ENV=development

# Make scripts executable
RUN chmod +x scripts/*.sh

# Default command
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
</code></pre></div></pre>
<h4 id="configuration-for-local-environment">Configuration for Local Environment</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/config/local.py
"""Configuration for local development environment."""

import os
from typing import Dict, Any, List

# API configuration
API_HOST = "0.0.0.0"
API_PORT = 8000
API_RELOAD = True
API_DEBUG = True

# OpenAI configuration
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "")
OPENAI_ORG_ID = os.environ.get("OPENAI_ORG_ID", "")
OPENAI_MODEL = "gpt-3.5-turbo"  # Default to cheaper model for development

# Ollama configuration
OLLAMA_HOST = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
OLLAMA_MODEL = "llama2"  # Default local model
ENABLE_GPU = True

# App configuration
LOG_LEVEL = "DEBUG"
ENABLE_CORS = True
CORS_ORIGINS = ["http://localhost:3000", "http://127.0.0.1:3000"]

# Feature flags
ENABLE_CACHING = True
ENABLE_RATE_LIMITING = False  # Disable rate limiting in local development
ENABLE_PARALLEL_PROCESSING = True
ENABLE_RESPONSE_VERIFICATION = True

# Development-specific settings
FORCE_DEV_MODE = os.environ.get("FORCE_DEV_MODE", "false").lower() == "true"
DEV_OPENAI_QUOTA = 100  # Maximum OpenAI API calls per day in development

# Redis configuration
REDIS_URL = os.environ.get("REDIS_URL", "redis://localhost:6379/0")
</code></pre></div></pre>
<h3 id="production-deployment">Production Deployment</h3>
<h4 id="kubernetes-manifests-for-production">Kubernetes Manifests for Production</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-api
  labels:
    app: mcp-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcp-api
  template:
    metadata:
      labels:
        app: mcp-api
    spec:
      containers:
      - name: api
        image: ${DOCKER_REGISTRY}/mcp-api:${IMAGE_TAG}
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
        env:
        - name: APP_ENV
          value: "production"
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: redis_url
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: openai_api_key
        - name: OLLAMA_HOST
          value: "http://ollama-service:11434"
        - name: MONTHLY_BUDGET
          value: "${MONTHLY_BUDGET}"
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        readinessProbe:
          httpGet:
            path: /api/health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /api/health
            port: 8000
          initialDelaySeconds: 20
          periodSeconds: 15
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  labels:
    app: ollama
spec:
  replicas: 1  # Start with a single replica for Ollama
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - mountPath: /root/.ollama
          name: ollama-data
        resources:
          requests:
            cpu: 1000m
            memory: 4Gi
          limits:
            cpu: 4000m
            memory: 16Gi
        # If using GPU
        env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: "all"
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: "compute,utility"
      volumes:
      - name: ollama-data
        persistentVolumeClaim:
          claimName: ollama-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: mcp-api-service
spec:
  selector:
    app: mcp-api
  ports:
  - port: 80
    targetPort: 8000
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
  - port: 11434
    targetPort: 11434
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - api.mcpservice.com
    secretName: mcp-tls
  rules:
  - host: api.mcpservice.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: mcp-api-service
            port:
              number: 80
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi  # Adjust based on your models
</code></pre></div></pre>
<h4 id="horizontal-pod-autoscaling-hpa">Horizontal Pod Autoscaling (HPA)</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># kubernetes/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
</code></pre></div></pre>
<h4 id="deployment-script">Deployment Script</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">#!/bin/bash
# deploy.sh - Production deployment script

set -e  # Exit on error

# Check required environment variables
if [ -z "$DOCKER_REGISTRY" ] || [ -z "$IMAGE_TAG" ] || [ -z "$K8S_NAMESPACE" ]; then
    echo "Error: Required environment variables not set."
    echo "Please set DOCKER_REGISTRY, IMAGE_TAG, and K8S_NAMESPACE."
    exit 1
fi

# Build and push Docker image
echo "Building and pushing Docker image..."
docker build -t ${DOCKER_REGISTRY}/mcp-api:${IMAGE_TAG} -f Dockerfile.prod .
docker push ${DOCKER_REGISTRY}/mcp-api:${IMAGE_TAG}

# Apply Kubernetes configuration
echo "Applying Kubernetes configuration..."

# Create namespace if it doesn't exist
kubectl get namespace ${K8S_NAMESPACE} || kubectl create namespace ${K8S_NAMESPACE}

# Apply secrets
echo "Applying secrets..."
kubectl apply -f kubernetes/secrets.yaml -n ${K8S_NAMESPACE}

# Deploy Redis if needed
echo "Deploying Redis..."
helm upgrade --install redis bitnami/redis \
  --namespace ${K8S_NAMESPACE} \
  --set auth.password=${REDIS_PASSWORD} \
  --set master.persistence.size=8Gi

# Deploy application
echo "Deploying application..."
# Replace variables in deployment file
envsubst < kubernetes/deployment.yaml | kubectl apply -f - -n ${K8S_NAMESPACE}

# Apply HPA
kubectl apply -f kubernetes/hpa.yaml -n ${K8S_NAMESPACE}

# Verify deployment
echo "Verifying deployment..."
kubectl rollout status deployment/mcp-api -n ${K8S_NAMESPACE}
kubectl rollout status deployment/ollama -n ${K8S_NAMESPACE}

# Initialize Ollama models if needed
echo "Would you like to initialize Ollama models? (y/n)"
read init_models

if [ "$init_models" = "y" ]; then
    echo "Initializing Ollama models..."
    # Get pod name
    OLLAMA_POD=$(kubectl get pods -l app=ollama -n ${K8S_NAMESPACE} -o jsonpath="{.items[0].metadata.name}")
    
    # Pull models
    kubectl exec ${OLLAMA_POD} -n ${K8S_NAMESPACE} -- ollama pull llama2
    kubectl exec ${OLLAMA_POD} -n ${K8S_NAMESPACE} -- ollama pull mistral
    kubectl exec ${OLLAMA_POD} -n ${K8S_NAMESPACE} -- ollama pull codellama
fi

echo "Deployment complete!"
echo "API available at: https://api.mcpservice.com"
</code></pre></div></pre>
<h4 id="production-dockerfile">Production Dockerfile</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Dockerfile</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-dockerfile"># Dockerfile.prod
FROM python:3.11-slim as builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt ./
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app

# Copy wheels from builder stage
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache /wheels/*

# Copy application code
COPY app /app/app
COPY scripts /app/scripts
COPY alembic.ini /app/

# Create non-root user
RUN useradd -m appuser && \
    chown -R appuser:appuser /app
USER appuser

# Set production environment
ENV PYTHONPATH=/app
ENV APP_ENV=production
ENV PYTHONUNBUFFERED=1

# Expose port
EXPOSE 8000

# Run using Gunicorn in production
CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "-c", "app/config/gunicorn.py", "app.main:app"]
</code></pre></div></pre>
<h4 id="gunicorn-configuration-for-production">Gunicorn Configuration for Production</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/config/gunicorn.py
"""Gunicorn configuration for production deployment."""

import multiprocessing
import os

# Bind to 0.0.0.0:8000
bind = "0.0.0.0:8000"

# Worker configuration
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
timeout = 60
keepalive = 5

# Logging
accesslog = "-"
errorlog = "-"
loglevel = os.environ.get("LOG_LEVEL", "info").lower()

# Security
limit_request_line = 4094
limit_request_fields = 100
limit_request_field_size = 8190

# Process naming
proc_name = "mcp-api"
</code></pre></div></pre>
<h3 id="cloud-deployment-aws">Cloud Deployment (AWS)</h3>
<h4 id="aws-cloudformation-template">AWS CloudFormation Template</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># aws/cloudformation.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'MCP OpenAI-Ollama Hybrid System'

Parameters:
  Environment:
    Description: Deployment environment
    Type: String
    Default: Production
    AllowedValues:
      - Development
      - Staging
      - Production
    
  ECRRepositoryName:
    Description: ECR Repository name
    Type: String
    Default: mcp-api
  
  VpcId:
    Description: VPC ID
    Type: AWS::EC2::VPC::Id
  
  SubnetIds:
    Description: Subnet IDs for the ECS tasks
    Type: List<AWS::EC2::Subnet::Id>
  
  OllamaInstanceType:
    Description: EC2 instance type for Ollama
    Type: String
    Default: g4dn.xlarge
    AllowedValues:
      - g4dn.xlarge
      - g5.xlarge
      - p3.2xlarge
      - c5.2xlarge  # CPU-only option
  
  ApiInstanceCount:
    Description: Number of API instances
    Type: Number
    Default: 2
    MinValue: 1
    MaxValue: 10

Resources:
  # ECR Repository
  ECRRepository:
    Type: AWS::ECR::Repository
    Properties:
      RepositoryName: !Ref ECRRepositoryName
      ImageScanningConfiguration:
        ScanOnPush: true
      LifecyclePolicy:
        LifecyclePolicyText: |
          {
            "rules": [
              {
                "rulePriority": 1,
                "description": "Keep only the 10 most recent images",
                "selection": {
                  "tagStatus": "any",
                  "countType": "imageCountMoreThan",
                  "countNumber": 10
                },
                "action": {
                  "type": "expire"
                }
              }
            ]
          }

  # ElastiCache Redis
  RedisSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for Redis cluster
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 6379
          ToPort: 6379
          SourceSecurityGroupId: !GetAtt APISecurityGroup.GroupId

  RedisSubnetGroup:
    Type: AWS::ElastiCache::SubnetGroup
    Properties:
      Description: Subnet group for Redis
      SubnetIds: !Ref SubnetIds

  RedisCluster:
    Type: AWS::ElastiCache::CacheCluster
    Properties:
      Engine: redis
      CacheNodeType: cache.t3.medium
      NumCacheNodes: 1
      VpcSecurityGroupIds:
        - !GetAtt RedisSecurityGroup.GroupId
      CacheSubnetGroupName: !Ref RedisSubnetGroup
      AutoMinorVersionUpgrade: true

  # Ollama EC2 Instance
  OllamaSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for Ollama EC2 instance
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 11434
          ToPort: 11434
          SourceSecurityGroupId: !GetAtt APISecurityGroup.GroupId
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0  # Restrict this in production

  OllamaInstanceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

  OllamaInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles:
        - !Ref OllamaInstanceRole

  OllamaEBSVolume:
    Type: AWS::EC2::Volume
    Properties:
      AvailabilityZone: !Select [0, !GetAZs '']
      Size: 100
      VolumeType: gp3
      Encrypted: true
      Tags:
        - Key: Name
          Value: OllamaVolume

  OllamaInstance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: !Ref OllamaInstanceType
      ImageId: ami-0261755bbcb8c4a84  # Amazon Linux 2 AMI - update as needed
      SecurityGroupIds:
        - !GetAtt OllamaSecurityGroup.GroupId
      SubnetId: !Select [0, !Ref SubnetIds]
      IamInstanceProfile: !Ref OllamaInstanceProfile
      BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            VolumeSize: 30
            VolumeType: gp3
            DeleteOnTermination: true
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          # Install Docker
          amazon-linux-extras install docker -y
          systemctl start docker
          systemctl enable docker
          
          # Install Ollama
          curl -fsSL https://ollama.com/install.sh | sh
          
          # Run Ollama in Docker
          docker run -d --name ollama \
            -p 11434:11434 \
            -v ollama:/root/.ollama \
            ollama/ollama
          
          # Pull models
          docker exec ollama ollama pull llama2
          docker exec ollama ollama pull mistral
          docker exec ollama ollama pull codellama
      Tags:
        - Key: Name
          Value: !Sub "${AWS::StackName}-ollama"

  OllamaVolumeAttachment:
    Type: AWS::EC2::VolumeAttachment
    Properties:
      InstanceId: !Ref OllamaInstance
      VolumeId: !Ref OllamaEBSVolume
      Device: /dev/sdf

  # API ECS Cluster
  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: !Sub "${AWS::StackName}-cluster"
      CapacityProviders:
        - FARGATE
      DefaultCapacityProviderStrategy:
        - CapacityProvider: FARGATE
          Weight: 1

  APISecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for API ECS tasks
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 8000
          ToPort: 8000
          CidrIp: 0.0.0.0/0  # Restrict in production

  # ECS Task Definition
  ECSTaskExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

  ECSTaskRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ecs-tasks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

  APITaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: !Sub "${AWS::StackName}-api"
      Cpu: '1024'
      Memory: '2048'
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ExecutionRoleArn: !GetAtt ECSTaskExecutionRole.Arn
      TaskRoleArn: !GetAtt ECSTaskRole.Arn
      ContainerDefinitions:
        - Name: api
          Image: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ECRRepositoryName}:latest"
          Essential: true
          PortMappings:
            - ContainerPort: 8000
          Environment:
            - Name: REDIS_URL
              Value: !Sub "redis://${RedisCluster.RedisEndpoint.Address}:${RedisCluster.RedisEndpoint.Port}/0"
            - Name: OLLAMA_HOST
              Value: !Sub "http://${OllamaInstance.PrivateIp}:11434"
            - Name: APP_ENV
              Value: !Ref Environment
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref APILogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: api
          HealthCheck:
            Command:
              - CMD-SHELL
              - curl -f http://localhost:8000/api/health || exit 1
            Interval: 30
            Timeout: 5
            Retries: 3

  APILogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub "/ecs/${AWS::StackName}-api"
      RetentionInDays: 7

  # ECS Service
  APIService:
    Type: AWS::ECS::Service
    Properties:
      ServiceName: !Sub "${AWS::StackName}-api"
      Cluster: !Ref ECSCluster
      TaskDefinition: !Ref APITaskDefinition
      DesiredCount: !Ref ApiInstanceCount
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          SecurityGroups:
            - !GetAtt APISecurityGroup.GroupId
          Subnets: !Ref SubnetIds
      LoadBalancers:
        - TargetGroupArn: !Ref ALBTargetGroup
          ContainerName: api
          ContainerPort: 8000
    DependsOn: ALBListener

  # Application Load Balancer
  ALB:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Name: !Sub "${AWS::StackName}-alb"
      Type: application
      Scheme: internet-facing
      SecurityGroups:
        - !GetAtt ALBSecurityGroup.GroupId
      Subnets: !Ref SubnetIds
      LoadBalancerAttributes:
        - Key: idle_timeout.timeout_seconds
          Value: '60'

  ALBSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for ALB
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 0.0.0.0/0

  ALBTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Name: !Sub "${AWS::StackName}-target-group"
      Port: 8000
      Protocol: HTTP
      TargetType: ip
      VpcId: !Ref VpcId
      HealthCheckPath: /api/health
      HealthCheckIntervalSeconds: 30
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 3
      UnhealthyThresholdCount: 3

  ALBListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      LoadBalancerArn: !Ref ALB
      Port: 80
      Protocol: HTTP
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref ALBTargetGroup

Outputs:
  APIEndpoint:
    Description: URL for API
    Value: !Sub "http://${ALB.DNSName}"
  
  OllamaEndpoint:
    Description: Ollama Server Private IP
    Value: !GetAtt OllamaInstance.PrivateIp
  
  ECRRepository:
    Description: ECR Repository URL
    Value: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ECRRepositoryName}"
  
  RedisEndpoint:
    Description: Redis Endpoint
    Value: !Sub "${RedisCluster.RedisEndpoint.Address}:${RedisCluster.RedisEndpoint.Port}"
</code></pre></div></pre>
<h4 id="aws-deployment-script">AWS Deployment Script</h4>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">#!/bin/bash
# aws_deploy.sh - AWS deployment script

set -e  # Exit on error

# Check required AWS CLI
if ! command -v aws &> /dev/null; then
    echo "AWS CLI is required but not installed. Aborting."
    exit 1
fi

# AWS configuration
AWS_REGION="us-east-1"
STACK_NAME="mcp-hybrid-system"
CFN_TEMPLATE="aws/cloudformation.yaml"
IMAGE_TAG=$(git rev-parse --short HEAD)

# Check if stack exists
if aws cloudformation describe-stacks --stack-name $STACK_NAME --region $AWS_REGION &> /dev/null; then
    STACK_ACTION="update"
else
    STACK_ACTION="create"
fi

# Deploy CloudFormation stack
if [ "$STACK_ACTION" = "create" ]; then
    echo "Creating CloudFormation stack..."
    aws cloudformation create-stack \
        --stack-name $STACK_NAME \
        --template-body file://$CFN_TEMPLATE \
        --capabilities CAPABILITY_IAM \
        --parameters \
            ParameterKey=Environment,ParameterValue=Production \
            ParameterKey=OllamaInstanceType,ParameterValue=g4dn.xlarge \
            ParameterKey=ApiInstanceCount,ParameterValue=2 \
        --region $AWS_REGION
    
    # Wait for stack creation
    echo "Waiting for stack creation to complete..."
    aws cloudformation wait stack-create-complete \
        --stack-name $STACK_NAME \
        --region $AWS_REGION
else
    echo "Updating CloudFormation stack..."
    aws cloudformation update-stack \
        --stack-name $STACK_NAME \
        --template-body file://$CFN_TEMPLATE \
        --capabilities CAPABILITY_IAM \
        --parameters \
            ParameterKey=Environment,ParameterValue=Production \
            ParameterKey=OllamaInstanceType,ParameterValue=g4dn.xlarge \
            ParameterKey=ApiInstanceCount,ParameterValue=2 \
        --region $AWS_REGION
    
    # Wait for stack update
    echo "Waiting for stack update to complete..."
    aws cloudformation wait stack-update-complete \
        --stack-name $STACK_NAME \
        --region $AWS_REGION
fi

# Get stack outputs
echo "Getting stack outputs..."
ECR_REPOSITORY=$(aws cloudformation describe-stacks \
    --stack-name $STACK_NAME \
    --query "Stacks[0].Outputs[?OutputKey=='ECRRepository'].OutputValue" \
    --output text \
    --region $AWS_REGION)

API_ENDPOINT=$(aws cloudformation describe-stacks \
    --stack-name $STACK_NAME \
    --query "Stacks[0].Outputs[?OutputKey=='APIEndpoint'].OutputValue" \
    --output text \
    --region $AWS_REGION)

# Build and push Docker image
echo "Building and pushing Docker image to ECR..."
# Login to ECR
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY

# Build and push
docker build -t $ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REPOSITORY:latest -f Dockerfile.prod .
docker push $ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REPOSITORY:latest

# Update ECS service to force deployment
echo "Updating ECS service..."
ECS_CLUSTER="${STACK_NAME}-cluster"
ECS_SERVICE="${STACK_NAME}-api"

aws ecs update-service \
    --cluster $ECS_CLUSTER \
    --service $ECS_SERVICE \
    --force-new-deployment \
    --region $AWS_REGION

echo "Deployment complete!"
echo "API Endpoint: $API_ENDPOINT"
</code></pre></div></pre>
<h1 id="optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system-continued">Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System (Continued)</h1>
<h2 id="monitoring-and-observability-configuration">Monitoring and Observability Configuration</h2>
<h3 id="prometheus-and-grafana-setup-for-metrics">Prometheus and Grafana Setup for Metrics</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># monitoring/prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'mcp-api'
        metrics_path: /metrics
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            regex: mcp-api
            action: keep

      - job_name: 'ollama'
        metrics_path: /metrics
        static_configs:
          - targets: ['ollama-service:11434']

    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['alertmanager:9093']
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.42.0
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus
            - name: prometheus-data
              mountPath: /prometheus
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--web.console.libraries=/usr/share/prometheus/console_libraries"
            - "--web.console.templates=/usr/share/prometheus/consoles"
            - "--web.enable-lifecycle"
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
        - name: prometheus-data
          persistentVolumeClaim:
            claimName: prometheus-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
spec:
  selector:
    app: prometheus
  ports:
    - port: 9090
      targetPort: 9090
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
        - name: grafana
          image: grafana/grafana:9.4.7
          ports:
            - containerPort: 3000
          volumeMounts:
            - name: grafana-data
              mountPath: /var/lib/grafana
          env:
            - name: GF_SECURITY_ADMIN_USER
              valueFrom:
                secretKeyRef:
                  name: grafana-secrets
                  key: admin_user
            - name: GF_SECURITY_ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: grafana-secrets
                  key: admin_password
      volumes:
        - name: grafana-data
          persistentVolumeClaim:
            claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: grafana-service
spec:
  selector:
    app: grafana
  ports:
    - port: 3000
      targetPort: 3000
  type: ClusterIP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
</code></pre></div></pre>
<h3 id="grafana-dashboard-configuration">Grafana Dashboard Configuration</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 1,
  "links": [],
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {}
        },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 2,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "7.2.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rate(api_requests_total[5m])",
          "interval": "",
          "legendFormat": "Requests ({{provider}})",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Request Rate by Provider",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "Requests/sec",
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {}
        },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 3,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "7.2.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "api_response_time_seconds{quantile=\"0.5\"}",
          "interval": "",
          "legendFormat": "50th % ({{provider}})",
          "refId": "A"
        },
        {
          "expr": "api_response_time_seconds{quantile=\"0.9\"}",
          "interval": "",
          "legendFormat": "90th % ({{provider}})",
          "refId": "B"
        },
        {
          "expr": "api_response_time_seconds{quantile=\"0.99\"}",
          "interval": "",
          "legendFormat": "99th % ({{provider}})",
          "refId": "C"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Response Time by Provider",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "s",
          "label": "Response Time",
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {},
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 8,
        "x": 0,
        "y": 8
      },
      "id": 4,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "mean"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "7.2.0",
      "targets": [
        {
          "expr": "sum(api_requests_total{provider=\"openai\"})",
          "interval": "",
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "OpenAI Total Requests",
      "type": "stat"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {},
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 8,
        "x": 8,
        "y": 8
      },
      "id": 5,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "mean"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "7.2.0",
      "targets": [
        {
          "expr": "sum(api_requests_total{provider=\"ollama\"})",
          "interval": "",
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "Ollama Total Requests",
      "type": "stat"
    },
    {
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {},
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "currencyUSD"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 8,
        "x": 16,
        "y": 8
      },
      "id": 6,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "sum"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "7.2.0",
      "targets": [
        {
          "expr": "sum(api_openai_cost_total)",
          "interval": "",
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "OpenAI Cost",
      "type": "stat"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {}
        },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 16
      },
      "hiddenSeries": false,
      "id": 7,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "7.2.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rate(api_token_usage_total{type=\"prompt\"}[5m])",
          "interval": "",
          "legendFormat": "Prompt ({{provider}})",
          "refId": "A"
        },
        {
          "expr": "rate(api_token_usage_total{type=\"completion\"}[5m])",
          "interval": "",
          "legendFormat": "Completion ({{provider}})",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Token Usage Rate by Type",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "Tokens/sec",
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "custom": {}
        },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 16
      },
      "hiddenSeries": false,
      "id": 8,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "7.2.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rate(api_cache_hits_total[5m])",
          "interval": "",
          "legendFormat": "Cache Hits",
          "refId": "A"
        },
        {
          "expr": "rate(api_cache_misses_total[5m])",
          "interval": "",
          "legendFormat": "Cache Misses",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Cache Performance",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "Rate",
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "10s",
  "schemaVersion": 26,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ]
  },
  "timezone": "",
  "title": "MCP Hybrid System Dashboard",
  "uid": "mcp-dashboard",
  "version": 1
}
</code></pre></div></pre>
<h3 id="implementing-metrics-collection-in-api">Implementing Metrics Collection in API</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/middleware/metrics.py
from fastapi import Request
import time
from prometheus_client import Counter, Histogram, Gauge
import logging

# Initialize metrics
REQUEST_COUNT = Counter(
    'api_requests_total', 
    'Total count of API requests',
    ['method', 'endpoint', 'provider', 'model', 'status']
)

RESPONSE_TIME = Histogram(
    'api_response_time_seconds',
    'Response time in seconds',
    ['method', 'endpoint', 'provider']
)

TOKEN_USAGE = Counter(
    'api_token_usage_total',
    'Total token usage',
    ['provider', 'model', 'type']  # type: prompt or completion
)

OPENAI_COST = Counter(
    'api_openai_cost_total',
    'Total OpenAI API cost in USD',
    ['model']
)

ACTIVE_REQUESTS = Gauge(
    'api_active_requests',
    'Number of active requests',
    ['method']
)

CACHE_HITS = Counter(
    'api_cache_hits_total',
    'Total cache hits',
    ['cache_type']  # exact or semantic
)

CACHE_MISSES = Counter(
    'api_cache_misses_total',
    'Total cache misses',
    []
)

logger = logging.getLogger(__name__)

async def metrics_middleware(request: Request, call_next):
    """Middleware to collect metrics for API requests."""
    # Track active requests
    ACTIVE_REQUESTS.labels(method=request.method).inc()
    
    # Start timing
    start_time = time.time()
    
    # Default status code
    status_code = 500
    provider = "unknown"
    model = "unknown"
    
    try:
        # Process the request
        response = await call_next(request)
        status_code = response.status_code
        
        # Try to get provider and model from response headers if available
        provider = response.headers.get("X-Provider", "unknown")
        model = response.headers.get("X-Model", "unknown")
        
        return response
    except Exception as e:
        logger.exception("Unhandled exception in request")
        raise
    finally:
        # Calculate response time
        response_time = time.time() - start_time
        
        # Record metrics
        REQUEST_COUNT.labels(
            method=request.method,
            endpoint=request.url.path,
            provider=provider,
            model=model,
            status=status_code
        ).inc()
        
        RESPONSE_TIME.labels(
            method=request.method,
            endpoint=request.url.path,
            provider=provider
        ).observe(response_time)
        
        # Decrement active requests
        ACTIVE_REQUESTS.labels(method=request.method).dec()
</code></pre></div></pre>
<h2 id="scaling-strategies">Scaling Strategies</h2>
<h3 id="optimizing-ollama-scaling-for-high-loads">Optimizing Ollama Scaling for High Loads</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/ollama_scaling.py
import logging
import asyncio
import time
from typing import Dict, List, Any, Optional
import random
import httpx

logger = logging.getLogger(__name__)

class OllamaScalingService:
    """
    Manages load balancing and scaling for multiple Ollama instances.
    """
    
    def __init__(self):
        self.ollama_instances = []
        self.instance_status = {}
        self.model_availability = {}
        self.health_check_interval = 60  # seconds
        self.enable_scaling = False
        self.min_instances = 1
        self.max_instances = 5
        self.health_check_task = None
    
    async def initialize(self, instances: List[str]):
        """Initialize the service with a list of Ollama instances."""
        self.ollama_instances = instances
        self.instance_status = {instance: False for instance in instances}
        self.model_availability = {instance: [] for instance in instances}
        
        # Start health checking
        self.health_check_task = asyncio.create_task(self._health_check_loop())
        
        # Perform initial health check
        await self._check_all_instances()
        
        logger.info(f"Initialized Ollama scaling with {len(instances)} instances")
    
    async def shutdown(self):
        """Shutdown the service."""
        if self.health_check_task:
            self.health_check_task.cancel()
            try:
                await self.health_check_task
            except asyncio.CancelledError:
                pass
    
    async def _health_check_loop(self):
        """Periodically check health of all instances."""
        while True:
            try:
                await self._check_all_instances()
                await asyncio.sleep(self.health_check_interval)
            except asyncio.CancelledError:
                break
            except Exception as e:
                logger.error(f"Error in health check loop: {str(e)}")
                await asyncio.sleep(5)  # Shorter retry on error
    
    async def _check_all_instances(self):
        """Check health and model availability for all instances."""
        tasks = []
        for instance in self.ollama_instances:
            tasks.append(self._check_instance(instance))
        
        # Run all checks in parallel
        await asyncio.gather(*tasks, return_exceptions=True)
        
        # Log status
        healthy_count = sum(1 for status in self.instance_status.values() if status)
        logger.debug(f"Ollama health check: {healthy_count}/{len(self.ollama_instances)} instances healthy")
    
    async def _check_instance(self, instance: str):
        """Check health and model availability for a single instance."""
        try:
            async with httpx.AsyncClient(timeout=5.0) as client:
                response = await client.get(f"{instance}/api/version")
                
                if response.status_code == 200:
                    # Instance is healthy
                    self.instance_status[instance] = True
                    
                    # Check available models
                    models_response = await client.get(f"{instance}/api/tags")
                    if models_response.status_code == 200:
                        data = models_response.json()
                        models = [model["name"] for model in data.get("models", [])]
                        self.model_availability[instance] = models
                else:
                    self.instance_status[instance] = False
        except Exception as e:
            logger.warning(f"Health check failed for {instance}: {str(e)}")
            self.instance_status[instance] = False
    
    def get_instance_for_model(self, model: str) -> Optional[str]:
        """Get the best instance for a specific model."""
        # Filter to healthy instances that have the model
        candidates = [
            instance for instance, status in self.instance_status.items()
            if status and model in self.model_availability.get(instance, [])
        ]
        
        if not candidates:
            return None
        
        # Use random selection for basic load balancing
        # A more sophisticated version would track load, response times, etc.
        return random.choice(candidates)
    
    def get_healthy_instance(self) -> Optional[str]:
        """Get any healthy instance."""
        candidates = [
            instance for instance, status in self.instance_status.items()
            if status
        ]
        
        if not candidates:
            return None
            
        return random.choice(candidates)
    
    async def ensure_model_availability(self, model: str) -> bool:
        """
        Ensure at least one instance has the required model.
        Returns True if model is available or successfully pulled.
        """
        # Check if any instance already has this model
        for instance, models in self.model_availability.items():
            if self.instance_status.get(instance, False) and model in models:
                return True
        
        # Try to pull the model on a healthy instance
        instance = self.get_healthy_instance()
        if not instance:
            logger.error(f"No healthy Ollama instances available to pull model {model}")
            return False
        
        # Try to pull the model
        try:
            async with httpx.AsyncClient(timeout=300.0) as client:  # Longer timeout for model pull
                response = await client.post(
                    f"{instance}/api/pull",
                    json={"name": model}
                )
                
                if response.status_code == 200:
                    logger.info(f"Successfully pulled model {model} on {instance}")
                    # Update model availability
                    if instance in self.model_availability:
                        self.model_availability[instance].append(model)
                    return True
                else:
                    logger.error(f"Failed to pull model {model} on {instance}: {response.text}")
                    return False
        except Exception as e:
            logger.error(f"Error pulling model {model} on {instance}: {str(e)}")
            return False
</code></pre></div></pre>
<h3 id="autoscaling-configuration-for-cloud-deployments">Autoscaling Configuration for Cloud Deployments</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># kubernetes/autoscaler-config.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: mcp-api-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: mcp-api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 250m
          memory: 256Mi
        maxAllowed:
          cpu: 2000m
          memory: 4Gi
        controlledResources: ["cpu", "memory"]
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: mcp-api-scaler
spec:
  scaleTargetRef:
    name: mcp-api
  minReplicaCount: 2
  maxReplicaCount: 20
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-service:9090
      metricName: api_active_requests
      threshold: '10'
      query: sum(api_active_requests)
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-service:9090
      metricName: api_response_time_p90
      threshold: '2.0'
      query: histogram_quantile(0.9, sum(rate(api_response_time_seconds_bucket[2m])) by (le))
</code></pre></div></pre>
<h2 id="cost-optimization---monthly-budget-tracking">Cost Optimization - Monthly Budget Tracking</h2>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python"># app/services/budget_service.py
import logging
import time
from datetime import datetime, timedelta
import aioredis
import json
from typing import Dict, Any, Optional

logger = logging.getLogger(__name__)

class BudgetService:
    """
    Manages API budget tracking and quota enforcement.
    """
    
    def __init__(self, redis_url: str):
        self.redis = None
        self.redis_url = redis_url
        self.monthly_budget = 0.0
        self.daily_budget = 0.0
        self.alert_threshold = 0.8  # Alert at 80% of budget
        self.budget_lock_key = "budget:lock"
        self.last_reset_check = 0
    
    async def initialize(self, monthly_budget: float = 0.0):
        """Initialize the budget service."""
        self.redis = await aioredis.create_redis_pool(self.redis_url)
        self.monthly_budget = monthly_budget
        self.daily_budget = monthly_budget / 30 if monthly_budget > 0 else 0
        
        # Initialize monthly budget in Redis if not already set
        if not await self.redis.exists("budget:monthly:total"):
            await self.redis.set("budget:monthly:total", str(monthly_budget))
        
        # Initialize current usage if not already set
        if not await self.redis.exists("budget:monthly:used"):
            await self.redis.set("budget:monthly:used", "0.0")
        
        # Set the reset day (1st of month)
        if not await self.redis.exists("budget:reset_day"):
            await self.redis.set("budget:reset_day", "1")
        
        # Check if we need to reset the budget
        await self._check_budget_reset()
        
        logger.info(f"Budget service initialized with monthly budget: ${monthly_budget:.2f}")
    
    async def close(self):
        """Close the Redis connection."""
        if self.redis:
            self.redis.close()
            await self.redis.wait_closed()
    
    async def _check_budget_reset(self):
        """Check if the budget needs to be reset (new month)."""
        now = time.time()
        # Only check once per hour to avoid excessive checks
        if now - self.last_reset_check < 3600:
            return
            
        self.last_reset_check = now
        
        try:
            # Try to acquire lock to avoid multiple resets
            lock = await self.redis.set(
                self.budget_lock_key, "1", 
                expire=60, exist="SET_IF_NOT_EXIST"
            )
            
            if not lock:
                return  # Another process is handling reset
            
            # Get the reset day (default to 1st of month)
            reset_day = int(await self.redis.get("budget:reset_day") or "1")
            
            # Get last reset timestamp
            last_reset = float(await self.redis.get("budget:last_reset") or "0")
            
            # Check if we're in a new month since last reset
            last_reset_date = datetime.fromtimestamp(last_reset)
            now_date = datetime.now()
            
            # If it's a new month and we've passed the reset day
            if (now_date.year > last_reset_date.year or 
                (now_date.year == last_reset_date.year and now_date.month > last_reset_date.month)) and \
                now_date.day >= reset_day:
                
                # Reset monthly usage
                await self.redis.set("budget:monthly:used", "0.0")
                
                # Update last reset timestamp
                await self.redis.set("budget:last_reset", str(now))
                
                # Log the reset
                logger.info("Monthly budget reset performed")
                
                # Archive previous month's usage for reporting
                prev_month = last_reset_date.strftime("%Y-%m")
                prev_usage = await self.redis.get("budget:monthly:used") or "0.0"
                await self.redis.set(f"budget:archive:{prev_month}", prev_usage)
        finally:
            # Release lock
            await self.redis.delete(self.budget_lock_key)
    
    async def record_usage(self, cost: float, provider: str, model: str):
        """Record API usage cost."""
        if cost <= 0:
            return
            
        # Only track costs for OpenAI
        if provider != "openai":
            return
        
        # Check if we need to reset first
        await self._check_budget_reset()
        
        # Update monthly usage
        await self.redis.incrbyfloat("budget:monthly:used", cost)
        
        # Update model-specific usage
        await self.redis.incrbyfloat(f"budget:model:{model}", cost)
        
        # Update daily usage
        today = datetime.now().strftime("%Y-%m-%d")
        await self.redis.incrbyfloat(f"budget:daily:{today}", cost)
        
        # Log high-cost operations
        if cost > 0.1:  # Log individual requests that cost more than 10 cents
            logger.info(f"High-cost API request: ${cost:.4f} for {provider}:{model}")
            
        # Check if we've exceeded the alert threshold
        usage = float(await self.redis.get("budget:monthly:used") or "0")
        budget = float(await self.redis.get("budget:monthly:total") or "0")
        
        if budget > 0 and usage >= budget * self.alert_threshold:
            # Check if we've already alerted for this threshold
            alerted = await self.redis.get(f"budget:alerted:{int(self.alert_threshold * 100)}")
            
            if not alerted:
                percentage = (usage / budget) * 100
                logger.warning(f"Budget alert: Used ${usage:.2f} of ${budget:.2f} ({percentage:.1f}%)")
                
                # Mark as alerted for this threshold
                await self.redis.set(
                    f"budget:alerted:{int(self.alert_threshold * 100)}", "1",
                    expire=86400  # Expire after 1 day
                )
    
    async def check_budget_available(self, estimated_cost: float) -> bool:
        """
        Check if there's enough budget for an estimated operation.
        Returns True if operation is allowed, False if it would exceed budget.
        """
        if estimated_cost <= 0:
            return True
            
        if self.monthly_budget <= 0:
            return True  # No budget constraints
        
        # Get current usage
        usage = float(await self.redis.get("budget:monthly:used") or "0")
        budget = float(await self.redis.get("budget:monthly:total") or "0")
        
        # Check if operation would exceed budget
        return (usage + estimated_cost) <= budget
    
    async def get_usage_stats(self) -> Dict[str, Any]:
        """Get current budget usage statistics."""
        usage = float(await self.redis.get("budget:monthly:used") or "0")
        budget = float(await self.redis.get("budget:monthly:total") or "0")
        
        # Get daily usage for the last 30 days
        daily_usage = {}
        today = datetime.now()
        
        for i in range(30):
            date = (today - timedelta(days=i)).strftime("%Y-%m-%d")
            day_usage = float(await self.redis.get(f"budget:daily:{date}") or "0")
            daily_usage[date] = day_usage
        
        # Get usage by model
        model_keys = await self.redis.keys("budget:model:*")
        model_usage = {}
        
        for key in model_keys:
            model = key.decode('utf-8').replace("budget:model:", "")
            model_cost = float(await self.redis.get(key) or "0")
            model_usage[model] = model_cost
        
        # Calculate percentage used
        percentage_used = (usage / budget) * 100 if budget > 0 else 0
        
        return {
            "current_usage": usage,
            "monthly_budget": budget,
            "percentage_used": percentage_used,
            "daily_usage": daily_usage,
            "model_usage": model_usage,
            "remaining_budget": budget - usage if budget > 0 else 0
        }
</code></pre></div></pre>
<h2 id="conclusion-5">Conclusion</h2>
<p>The optimization and deployment strategies outlined in this document provide a comprehensive framework for implementing an efficient, cost-effective, and highly accurate hybrid AI system that leverages both OpenAI's cloud capabilities and Ollama's local inference.</p>
<p>Key aspects of this implementation include:</p>
<ol>
<li>
<p><strong>Performance Optimization</strong>:</p>
<ul>
<li>Query routing optimization based on complexity analysis</li>
<li>Semantic response caching for frequent queries</li>
<li>Parallel processing for complex queries</li>
<li>Dynamic batching for high-load scenarios</li>
<li>Model-specific prompt optimization</li>
</ul>
</li>
<li>
<p><strong>Cost Reduction</strong>:</p>
<ul>
<li>Intelligent token usage optimization</li>
<li>Tiered model selection based on task requirements</li>
<li>Local model prioritization for development</li>
<li>Request batching and rate limiting</li>
<li>Memory and context compression</li>
</ul>
</li>
<li>
<p><strong>Response Accuracy</strong>:</p>
<ul>
<li>Advanced prompt templating for different scenarios</li>
<li>Chain-of-thought reasoning for complex queries</li>
<li>Self-verification and error correction</li>
<li>Domain-specific knowledge integration</li>
<li>Dynamic few-shot learning with examples</li>
</ul>
</li>
<li>
<p><strong>Deployment Options</strong>:</p>
<ul>
<li>Local development environment with Docker Compose</li>
<li>Production Kubernetes deployment with autoscaling</li>
<li>AWS cloud deployment with CloudFormation</li>
<li>Comprehensive monitoring with Prometheus and Grafana</li>
<li>Budget tracking and cost optimization</li>
</ul>
</li>
</ol>
<p>These strategies work in concert to create a system that intelligently balances the tradeoffs between performance, cost, and accuracy, adapting to specific requirements and constraints in different deployment scenarios.</p>
<p>By implementing this hybrid approach, organizations can significantly reduce API costs while maintaining high quality responses, with the added benefits of enhanced privacy for sensitive data and reduced dependency on external services. The local inference capabilities also provide resilience against API outages and rate limiting, ensuring consistent service availability.</p>
<h1 id="mcp-modern-computational-paradigm-system">MCP (Modern Computational Paradigm) System</h1>
<h2 id="comprehensive-documentation">Comprehensive Documentation</h2>
<p>This documentation provides a complete guide to understanding, installing, configuring, and using the MCP system - a hybrid architecture that integrates OpenAI's API capabilities with Ollama's local inference to create an optimized, cost-effective AI solution.</p>
<hr/>
<h1 id="table-of-contents">Table of Contents</h1>
<ol>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#system-architecture">System Architecture</a></li>
<li><a href="#installation-guide">Installation Guide</a>
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#local-development-setup">Local Development Setup</a></li>
<li><a href="#docker-deployment">Docker Deployment</a></li>
<li><a href="#kubernetes-deployment">Kubernetes Deployment</a></li>
<li><a href="#aws-deployment">AWS Deployment</a></li>
</ul>
</li>
<li><a href="#configuration">Configuration</a>
<ul>
<li><a href="#environment-variables">Environment Variables</a></li>
<li><a href="#advanced-configuration">Advanced Configuration</a></li>
<li><a href="#model-selection">Model Selection</a></li>
</ul>
</li>
<li><a href="#api-reference">API Reference</a>
<ul>
<li><a href="#authentication">Authentication</a></li>
<li><a href="#chat-endpoints">Chat Endpoints</a></li>
<li><a href="#agent-endpoints">Agent Endpoints</a></li>
<li><a href="#model-management-endpoints">Model Management Endpoints</a></li>
<li><a href="#system-endpoints">System Endpoints</a></li>
</ul>
</li>
<li><a href="#usage-examples">Usage Examples</a>
<ul>
<li><a href="#basic-chat-interaction">Basic Chat Interaction</a></li>
<li><a href="#working-with-agents">Working with Agents</a></li>
<li><a href="#customizing-model-selection">Customizing Model Selection</a></li>
<li><a href="#tool-integration">Tool Integration</a></li>
</ul>
</li>
<li><a href="#performance-optimization">Performance Optimization</a>
<ul>
<li><a href="#caching-strategies">Caching Strategies</a></li>
<li><a href="#query-optimization">Query Optimization</a></li>
<li><a href="#parallel-processing">Parallel Processing</a></li>
</ul>
</li>
<li><a href="#cost-optimization">Cost Optimization</a>
<ul>
<li><a href="#budget-management">Budget Management</a></li>
<li><a href="#provider-selection">Provider Selection</a></li>
<li><a href="#token-optimization">Token Optimization</a></li>
</ul>
</li>
<li><a href="#monitoring-and-observability">Monitoring and Observability</a>
<ul>
<li><a href="#metrics-overview">Metrics Overview</a></li>
<li><a href="#grafana-dashboard">Grafana Dashboard</a></li>
<li><a href="#alerting">Alerting</a></li>
</ul>
</li>
<li><a href="#troubleshooting">Troubleshooting</a>
<ul>
<li><a href="#common-issues">Common Issues</a></li>
<li><a href="#diagnostics">Diagnostics</a></li>
<li><a href="#log-management">Log Management</a></li>
</ul>
</li>
<li><a href="#contributing">Contributing</a></li>
<li><a href="#license">License</a></li>
</ol>
<hr/>
<h1 id="readmemd">README.md</h1>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Markdown</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-markdown"># MCP - Modern Computational Paradigm

![MCP Status](https://img.shields.io/badge/status-stable-green)
![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)
![License MIT](https://img.shields.io/badge/license-MIT-green.svg)

MCP is a hybrid AI system that intelligently integrates OpenAI's cloud capabilities with Ollama's local inference. This architecture optimizes for cost, performance, and privacy while maintaining response quality.

## Key Features

- **Intelligent Query Routing**: Automatically selects between OpenAI and Ollama based on query complexity, privacy requirements, and performance needs
- **Advanced Agent Framework**: Configurable AI agents with specialized capabilities
- **Cost Optimization**: Reduces API costs by up to 70% through local model usage, caching, and token optimization
- **Privacy Control**: Keeps sensitive information local when appropriate
- **Performance Optimization**: Parallel processing, response caching, and dynamic batching for high throughput
- **Comprehensive Monitoring**: Built-in metrics and observability

## Quick Start

### Prerequisites

- Python 3.11+
- Docker and Docker Compose (for containerized deployment)
- Ollama (for local model inference)
- OpenAI API key

### Installation

1. Clone the repository:
   ```bash
   git clone https://github.com/yourusername/mcp.git
   cd mcp
</code></pre></div></pre>
<ol start="2">
<li>
<p>Create and activate a virtual environment:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
</code></pre></div></pre>
</li>
<li>
<p>Install dependencies:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">pip install -r requirements.txt
</code></pre></div></pre>
</li>
<li>
<p>Set up environment variables:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">cp .env.example .env
# Edit .env with your configuration
</code></pre></div></pre>
</li>
<li>
<p>Start Ollama (if not already running):</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ollama serve
</code></pre></div></pre>
</li>
<li>
<p>Start the application:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">uvicorn app.main:app --reload
</code></pre></div></pre>
</li>
</ol>
<p>The API will be available at <a href="http://localhost:8000">http://localhost:8000</a>.</p>
<h3 id="docker-deployment">Docker Deployment</h3>
<p>For containerized deployment:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">docker-compose up -d
</code></pre></div></pre>
<h2 id="documentation">Documentation</h2>
<p>For complete documentation, see:</p>
<ul>
<li><a href="docs/installation.md">Installation Guide</a></li>
<li><a href="docs/api-reference.md">API Reference</a></li>
<li><a href="docs/configuration.md">Configuration Guide</a></li>
<li><a href="docs/troubleshooting.md">Troubleshooting</a></li>
</ul>
<h2 id="architecture">Architecture</h2>
<p>MCP uses a sophisticated routing architecture to determine the optimal inference provider for each request:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">┌─────────────────┐     ┌──────────────────┐     ┌─────────────┐
│                 │     │                  │     │             │
│  Client Request │────▶│ Routing Decision │────▶│ OpenAI API  │
│                 │     │                  │     │             │
└─────────────────┘     └──────────────────┘     └─────────────┘
                                │
                                │
                                ▼
                        ┌─────────────┐
                        │             │
                        │  Ollama API │
                        │             │
                        └─────────────┘
</code></pre>
<h2 id="license">License</h2>
<p>MIT License - see <a href="LICENSE">LICENSE</a> for details.</p>
<h2 id="contributing">Contributing</h2>
<p>Contributions are welcome! Please see <a href="CONTRIBUTING.md">CONTRIBUTING.md</a> for details.</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">
---

# Installation Guide

## Prerequisites

Before installing the MCP system, ensure your environment meets the following requirements:

### System Requirements

- **Operating System**: Linux (recommended), macOS, or Windows
- **CPU**: 4+ cores recommended
- **RAM**: Minimum 8GB, 16GB+ recommended
- **Disk Space**: 10GB minimum for installation, 50GB+ recommended for model storage
- **GPU**: Optional but recommended for Ollama (NVIDIA with CUDA support)

### Software Requirements

- **Python**: Version 3.11 or higher
- **Docker**: Version 20.10 or higher (for containerized deployment)
- **Docker Compose**: Version 2.0 or higher
- **Kubernetes**: Version 1.21+ (for Kubernetes deployment)
- **Ollama**: Latest version (for local model inference)
- **Redis**: Version 6.0+ (for caching and rate limiting)

### Required API Keys

- **OpenAI API Key**: Register at [OpenAI Platform](https://platform.openai.com/)

## Local Development Setup

Follow these steps to set up a local development environment:

### 1. Clone the Repository

```bash
git clone https://github.com/yourusername/mcp.git
cd mcp
</code></pre>
<h3 id="2-set-up-virtual-environment">2. Set Up Virtual Environment</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Create virtual environment
python -m venv venv

# Activate virtual environment
# On Linux/macOS:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
</code></pre></div></pre>
<h3 id="3-install-dependencies">3. Install Dependencies</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt  # For development tools
</code></pre></div></pre>
<h3 id="4-install-and-configure-ollama">4. Install and Configure Ollama</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># macOS (using Homebrew)
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama service
ollama serve
</code></pre></div></pre>
<h3 id="5-pull-required-models">5. Pull Required Models</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Pull basic models
ollama pull llama2
ollama pull mistral
ollama pull codellama
</code></pre></div></pre>
<h3 id="6-set-up-environment-variables">6. Set Up Environment Variables</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Copy the example environment file
cp .env.example .env

# Edit the file with your configuration
# At minimum, set OPENAI_API_KEY
nano .env
</code></pre></div></pre>
<h3 id="7-initialize-local-services">7. Initialize Local Services</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Start Redis using Docker
docker-compose up -d redis

# Initialize database (if applicable)
python scripts/init_db.py
</code></pre></div></pre>
<h3 id="8-start-development-server">8. Start Development Server</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Start with auto-reload for development
uvicorn app.main:app --reload --port 8000
</code></pre></div></pre>
<h3 id="9-verify-installation">9. Verify Installation</h3>
<p>Open your browser and navigate to:</p>
<ul>
<li>API documentation: <a href="http://localhost:8000/docs">http://localhost:8000/docs</a></li>
<li>Health check: <a href="http://localhost:8000/api/health">http://localhost:8000/api/health</a></li>
</ul>
<h2 id="docker-deployment-1">Docker Deployment</h2>
<p>For a containerized deployment using Docker Compose:</p>
<h3 id="1-ensure-docker-and-docker-compose-are-installed">1. Ensure Docker and Docker Compose are Installed</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Verify installation
docker --version
docker-compose --version
</code></pre></div></pre>
<h3 id="2-configure-environment-variables">2. Configure Environment Variables</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Copy and edit environment variables
cp .env.example .env
nano .env
</code></pre></div></pre>
<h3 id="3-start-services-with-docker-compose">3. Start Services with Docker Compose</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Build and start all services
docker-compose up -d

# View logs
docker-compose logs -f
</code></pre></div></pre>
<p>The application will be available at <a href="http://localhost:8000">http://localhost:8000</a>.</p>
<h3 id="4-stopping-the-services">4. Stopping the Services</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">docker-compose down
</code></pre></div></pre>
<h2 id="kubernetes-deployment">Kubernetes Deployment</h2>
<p>For production deployment on Kubernetes:</p>
<h3 id="1-prerequisites">1. Prerequisites</h3>
<ul>
<li>Kubernetes cluster</li>
<li>kubectl configured</li>
<li>Helm (optional, for Redis deployment)</li>
</ul>
<h3 id="2-set-up-namespace-and-secrets">2. Set Up Namespace and Secrets</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Create namespace
kubectl create namespace mcp

# Create secrets
kubectl create secret generic mcp-secrets \
  --from-literal=openai-api-key=YOUR_OPENAI_API_KEY \
  --from-literal=redis-password=YOUR_REDIS_PASSWORD \
  -n mcp
</code></pre></div></pre>
<h3 id="3-deploy-redis-if-needed">3. Deploy Redis (if needed)</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Using Helm
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
  --namespace mcp \
  --set auth.password=YOUR_REDIS_PASSWORD \
  --set master.persistence.size=8Gi
</code></pre></div></pre>
<h3 id="4-deploy-mcp-components">4. Deploy MCP Components</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Apply Kubernetes manifests
kubectl apply -f kubernetes/deployment.yaml -n mcp
kubectl apply -f kubernetes/service.yaml -n mcp
kubectl apply -f kubernetes/ingress.yaml -n mcp
</code></pre></div></pre>
<h3 id="5-set-up-autoscaling-optional">5. Set Up Autoscaling (Optional)</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">kubectl apply -f kubernetes/hpa.yaml -n mcp
</code></pre></div></pre>
<h3 id="6-check-deployment-status">6. Check Deployment Status</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">kubectl get pods -n mcp
kubectl get services -n mcp
kubectl get ingress -n mcp
</code></pre></div></pre>
<h2 id="aws-deployment">AWS Deployment</h2>
<p>For deployment on AWS Cloud:</p>
<h3 id="1-prerequisites-1">1. Prerequisites</h3>
<ul>
<li>AWS CLI configured</li>
<li>Appropriate IAM permissions</li>
</ul>
<h3 id="2-cloudformation-deployment">2. CloudFormation Deployment</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Deploy using CloudFormation template
aws cloudformation create-stack \
  --stack-name mcp-hybrid-system \
  --template-body file://aws/cloudformation.yaml \
  --capabilities CAPABILITY_IAM \
  --parameters \
    ParameterKey=Environment,ParameterValue=Production \
    ParameterKey=OllamaInstanceType,ParameterValue=g4dn.xlarge

# Check deployment status
aws cloudformation describe-stacks --stack-name mcp-hybrid-system
</code></pre></div></pre>
<h3 id="3-deploy-api-image-to-ecr">3. Deploy API Image to ECR</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Log in to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com

# Build and push image
docker build -t YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-api:latest -f Dockerfile.prod .
docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-api:latest
</code></pre></div></pre>
<h3 id="4-update-ecs-service">4. Update ECS Service</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Force new deployment to use the updated image
aws ecs update-service --cluster mcp-hybrid-system-cluster --service mcp-hybrid-system-api --force-new-deployment
</code></pre></div></pre>
<hr/>
<h1 id="api-reference">API Reference</h1>
<h2 id="authentication">Authentication</h2>
<p>The MCP API uses API key authentication. Include your API key in all requests using either:</p>
<h3 id="bearer-token-authentication">Bearer Token Authentication</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">Authorization: Bearer YOUR_API_KEY
</code></pre>
<h3 id="query-parameter">Query Parameter</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">?api_key=YOUR_API_KEY
</code></pre>
<h2 id="chat-endpoints">Chat Endpoints</h2>
<h3 id="create-chat-completion">Create Chat Completion</h3>
<p>Generates a completion for a given conversation.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">POST /api/v1/chat/completions</code></p>
<p><strong>Request Body:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, who are you?"}
  ],
  "model": "auto",
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false,
  "routing_preferences": {
    "force_provider": null,
    "privacy_level": "standard",
    "latency_preference": "balanced"
  },
  "tools": []
}
</code></pre></div></pre>
<p><strong>Parameters:</strong></p>













































<table><thead><tr><th>Name</th><th>Type</th><th>Description</th></tr></thead><tbody><tr><td>messages</td><td>array</td><td>Array of message objects representing the conversation history</td></tr><tr><td>model</td><td>string</td><td>The model to use, or "auto" for automatic selection</td></tr><tr><td>temperature</td><td>number</td><td>Controls randomness (0-1)</td></tr><tr><td>max_tokens</td><td>integer</td><td>Maximum tokens in response</td></tr><tr><td>stream</td><td>boolean</td><td>Whether to stream the response</td></tr><tr><td>routing_preferences</td><td>object</td><td>Preferences for provider selection</td></tr><tr><td>tools</td><td>array</td><td>List of tools the assistant can use</td></tr></tbody></table>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "id": "resp_abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "provider": "openai",
  "model": "gpt-4o",
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 325,
    "total_tokens": 381
  },
  "message": {
    "role": "assistant",
    "content": "Hello! I'm an AI assistant...",
    "tool_calls": []
  },
  "routing_metrics": {
    "complexity_score": 0.78,
    "privacy_impact": "low",
    "decision_factors": ["complexity", "tool_requirements"]
  }
}
</code></pre></div></pre>
<h3 id="stream-chat-completion">Stream Chat Completion</h3>
<p>Stream a completion for a conversation.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">POST /api/v1/chat/streaming</code></p>
<p><strong>Request Body:</strong> Same as <code node="[object Object]">/api/v1/chat/completions</code> but <code node="[object Object]">stream</code> must be <code node="[object Object]">true</code>.</p>
<p><strong>Response:</strong> Server-sent events (SSE) stream of partial completions.</p>
<h3 id="hybrid-chat">Hybrid Chat</h3>
<p>Intelligent routing between OpenAI and Ollama based on query characteristics.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">POST /api/v1/chat/hybrid</code></p>
<p><strong>Request Body:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ],
  "mode": "auto",
  "options": {
    "prioritize_privacy": false,
    "prioritize_speed": false
  }
}
</code></pre></div></pre>
<p><strong>Response:</strong> Same format as <code node="[object Object]">/api/v1/chat/completions</code>.</p>
<h2 id="agent-endpoints">Agent Endpoints</h2>
<h3 id="run-agent">Run Agent</h3>
<p>Execute an agent with specific configuration.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">POST /api/v1/agents/run</code></p>
<p><strong>Request Body:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "agent_config": {
    "instructions": "You are a research assistant...",
    "model": "gpt-4o",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "search_knowledge_base",
          "description": "Search for information",
          "parameters": {
            "type": "object",
            "properties": {
              "query": {
                "type": "string"
              }
            },
            "required": ["query"]
          }
        }
      }
    ]
  },
  "messages": [
    {"role": "user", "content": "Find information about renewable energy"}
  ],
  "metadata": {
    "session_id": "user_session_123"
  }
}
</code></pre></div></pre>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "run_id": "run_abc123",
  "status": "in_progress",
  "created_at": 1677858242,
  "estimated_completion_time": 1677858260,
  "polling_url": "/api/v1/agents/status/run_abc123"
}
</code></pre></div></pre>
<h3 id="get-agent-status">Get Agent Status</h3>
<p>Check the status of a running agent.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/agents/status/{run_id}</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "run_id": "run_abc123",
  "status": "completed",
  "result": {
    "output": "Renewable energy comes from sources that are...",
    "tool_calls": []
  },
  "created_at": 1677858242,
  "completed_at": 1677858260
}
</code></pre></div></pre>
<h3 id="list-available-agents">List Available Agents</h3>
<p>List all available agent configurations.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/agents</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "agents": [
    {
      "id": "research",
      "name": "Research Assistant",
      "description": "Specialized in finding and synthesizing information"
    },
    {
      "id": "coding",
      "name": "Code Assistant",
      "description": "Helps with programming tasks"
    }
  ]
}
</code></pre></div></pre>
<h2 id="model-management-endpoints">Model Management Endpoints</h2>
<h3 id="list-models">List Models</h3>
<p>List all available models.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/models</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "openai_models": [
    {
      "id": "gpt-4o",
      "name": "GPT-4o",
      "capabilities": ["general", "code", "reasoning"],
      "context_window": 128000
    },
    {
      "id": "gpt-3.5-turbo",
      "name": "GPT-3.5 Turbo",
      "capabilities": ["general"],
      "context_window": 16000
    }
  ],
  "ollama_models": [
    {
      "id": "llama2",
      "name": "Llama 2",
      "capabilities": ["general"],
      "context_window": 4096
    },
    {
      "id": "mistral",
      "name": "Mistral",
      "capabilities": ["general", "reasoning"],
      "context_window": 8192
    }
  ]
}
</code></pre></div></pre>
<h3 id="get-model-details">Get Model Details</h3>
<p>Get detailed information about a specific model.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/models/{model_id}</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "id": "mistral",
  "name": "Mistral",
  "provider": "ollama",
  "capabilities": ["general", "reasoning"],
  "context_window": 8192,
  "recommended_usage": "General purpose tasks with reasoning requirements",
  "performance_characteristics": {
    "average_response_time": 2.4,
    "tokens_per_second": 45
  }
}
</code></pre></div></pre>
<h3 id="pull-ollama-model">Pull Ollama Model</h3>
<p>Pull a new model for Ollama.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">POST /api/v1/models/ollama/pull</code></p>
<p><strong>Request Body:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "model": "wizard-math"
}
</code></pre></div></pre>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "status": "pulling",
  "model": "wizard-math",
  "estimated_time": 120
}
</code></pre></div></pre>
<h2 id="system-endpoints">System Endpoints</h2>
<h3 id="health-check">Health Check</h3>
<p>Check system health.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/health</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "status": "ok",
  "version": "1.0.0",
  "providers": {
    "openai": "connected",
    "ollama": "connected"
  },
  "uptime": 3600
}
</code></pre></div></pre>
<h3 id="system-configuration">System Configuration</h3>
<p>Get current system configuration.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/config</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "routing": {
    "complexity_threshold": 0.65,
    "privacy_sensitive_patterns": ["password", "secret", "key"],
    "default_provider": "auto"
  },
  "caching": {
    "enabled": true,
    "ttl": 3600
  },
  "optimization": {
    "token_optimization": true,
    "parallel_processing": true
  },
  "monitoring": {
    "metrics_collection": true,
    "log_level": "info"
  }
}
</code></pre></div></pre>
<h3 id="update-configuration">Update Configuration</h3>
<p>Update system configuration.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">POST /api/v1/config</code></p>
<p><strong>Request Body:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "routing": {
    "complexity_threshold": 0.7
  },
  "caching": {
    "ttl": 7200
  }
}
</code></pre></div></pre>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "status": "updated",
  "updated_fields": ["routing.complexity_threshold", "caching.ttl"]
}
</code></pre></div></pre>
<h3 id="system-metrics">System Metrics</h3>
<p>Get system performance metrics.</p>
<p><strong>Endpoint:</strong> <code node="[object Object]">GET /api/v1/metrics</code></p>
<p><strong>Response:</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "requests": {
    "total": 15420,
    "last_minute": 42,
    "last_hour": 1254
  },
  "routing": {
    "openai_requests": 6210,
    "ollama_requests": 9210,
    "auto_routing_accuracy": 0.94
  },
  "performance": {
    "average_response_time": 2.3,
    "p95_response_time": 6.1,
    "cache_hit_rate": 0.37
  },
  "cost": {
    "total_openai_cost": 135.42,
    "estimated_savings": 98.67,
    "cost_per_request": 0.0088
  }
}
</code></pre></div></pre>
<hr/>
<h1 id="configuration">Configuration</h1>
<h2 id="environment-variables">Environment Variables</h2>
<p>The MCP system can be configured using the following environment variables:</p>
<h3 id="core-configuration">Core Configuration</h3>


















































<table><thead><tr><th>Variable</th><th>Description</th><th>Default Value</th></tr></thead><tbody><tr><td><code node="[object Object]">OPENAI_API_KEY</code></td><td>OpenAI API Key</td><td>(Required)</td></tr><tr><td><code node="[object Object]">OPENAI_ORG_ID</code></td><td>OpenAI Organization ID</td><td>(Optional)</td></tr><tr><td><code node="[object Object]">OPENAI_MODEL</code></td><td>Default OpenAI model</td><td>gpt-4o</td></tr><tr><td><code node="[object Object]">OLLAMA_HOST</code></td><td>Ollama host URL</td><td><a href="http://localhost:11434">http://localhost:11434</a></td></tr><tr><td><code node="[object Object]">OLLAMA_MODEL</code></td><td>Default Ollama model</td><td>llama2</td></tr><tr><td><code node="[object Object]">APP_ENV</code></td><td>Environment (development, staging, production)</td><td>development</td></tr><tr><td><code node="[object Object]">LOG_LEVEL</code></td><td>Logging level</td><td>INFO</td></tr><tr><td><code node="[object Object]">PORT</code></td><td>API server port</td><td>8000</td></tr></tbody></table>
<h3 id="redis-configuration">Redis Configuration</h3>






























<table><thead><tr><th>Variable</th><th>Description</th><th>Default Value</th></tr></thead><tbody><tr><td><code node="[object Object]">REDIS_URL</code></td><td>Redis connection URL</td><td>redis://localhost:6379/0</td></tr><tr><td><code node="[object Object]">REDIS_PASSWORD</code></td><td>Redis password</td><td>(Optional)</td></tr><tr><td><code node="[object Object]">ENABLE_CACHING</code></td><td>Enable response caching</td><td>true</td></tr><tr><td><code node="[object Object]">CACHE_TTL</code></td><td>Cache TTL in seconds</td><td>3600</td></tr></tbody></table>
<h3 id="routing-configuration">Routing Configuration</h3>



































<table><thead><tr><th>Variable</th><th>Description</th><th>Default Value</th></tr></thead><tbody><tr><td><code node="[object Object]">COMPLEXITY_THRESHOLD</code></td><td>Threshold for routing to OpenAI</td><td>0.65</td></tr><tr><td><code node="[object Object]">PRIVACY_SENSITIVE_TOKENS</code></td><td>Comma-separated list of privacy-sensitive tokens</td><td>password,secret,key</td></tr><tr><td><code node="[object Object]">DEFAULT_PROVIDER</code></td><td>Default provider if not specified</td><td>auto</td></tr><tr><td><code node="[object Object]">FORCE_OLLAMA</code></td><td>Force using Ollama for all requests</td><td>false</td></tr><tr><td><code node="[object Object]">FORCE_OPENAI</code></td><td>Force using OpenAI for all requests</td><td>false</td></tr></tbody></table>
<h3 id="performance-configuration">Performance Configuration</h3>



































<table><thead><tr><th>Variable</th><th>Description</th><th>Default Value</th></tr></thead><tbody><tr><td><code node="[object Object]">ENABLE_PARALLEL_PROCESSING</code></td><td>Enable parallel processing for complex queries</td><td>true</td></tr><tr><td><code node="[object Object]">MAX_PARALLEL_REQUESTS</code></td><td>Maximum number of parallel requests</td><td>4</td></tr><tr><td><code node="[object Object]">ENABLE_BATCHING</code></td><td>Enable request batching</td><td>true</td></tr><tr><td><code node="[object Object]">MAX_BATCH_SIZE</code></td><td>Maximum batch size</td><td>5</td></tr><tr><td><code node="[object Object]">REQUEST_TIMEOUT</code></td><td>Request timeout in seconds</td><td>120</td></tr></tbody></table>
<h3 id="cost-optimization">Cost Optimization</h3>






























<table><thead><tr><th>Variable</th><th>Description</th><th>Default Value</th></tr></thead><tbody><tr><td><code node="[object Object]">MONTHLY_BUDGET</code></td><td>Monthly budget cap for OpenAI usage (USD)</td><td>0 (no limit)</td></tr><tr><td><code node="[object Object]">ENABLE_TOKEN_OPTIMIZATION</code></td><td>Enable token usage optimization</td><td>true</td></tr><tr><td><code node="[object Object]">TOKEN_BUDGET</code></td><td>Token budget per request</td><td>0 (no limit)</td></tr><tr><td><code node="[object Object]">DEV_MODE_TOKEN_LIMIT</code></td><td>Token limit in development mode</td><td>1000</td></tr></tbody></table>
<h3 id="monitoring">Monitoring</h3>






























<table><thead><tr><th>Variable</th><th>Description</th><th>Default Value</th></tr></thead><tbody><tr><td><code node="[object Object]">ENABLE_METRICS</code></td><td>Enable metrics collection</td><td>true</td></tr><tr><td><code node="[object Object]">METRICS_PORT</code></td><td>Prometheus metrics port</td><td>9090</td></tr><tr><td><code node="[object Object]">ENABLE_TRACING</code></td><td>Enable distributed tracing</td><td>false</td></tr><tr><td><code node="[object Object]">SENTRY_DSN</code></td><td>Sentry DSN for error tracking</td><td>(Optional)</td></tr></tbody></table>
<h2 id="advanced-configuration">Advanced Configuration</h2>
<h3 id="configuration-file">Configuration File</h3>
<p>For more advanced configuration, create a YAML configuration file at <code node="[object Object]">config/config.yaml</code>:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml">routing:
  # Complexity assessment weights
  complexity_weights:
    length: 0.3
    specialized_terms: 0.4
    sentence_structure: 0.3
  
  # Ollama model routing
  ollama_routing:
    code_generation: "codellama"
    mathematical: "wizard-math"
    creative: "dolphin-mistral"
    general: "mistral"
    
  # OpenAI model routing
  openai_routing:
    complex_reasoning: "gpt-4o"
    general: "gpt-3.5-turbo"

caching:
  # Semantic caching configuration
  semantic:
    enabled: true
    similarity_threshold: 0.92
    max_cached_items: 1000
    
  # Exact match caching
  exact:
    enabled: true
    max_cached_items: 500

optimization:
  # Chain of thought settings
  chain_of_thought:
    enabled: true
    task_types: ["reasoning", "math", "decision"]
    
  # Response verification
  verification:
    enabled: true
    high_risk_categories: ["financial", "legal", "medical"]

monitoring:
  # Logging configuration
  logging:
    format: "json"
    include_request_body: false
    mask_sensitive_data: true
    
  # Alert thresholds
  alerts:
    high_latency_threshold: 5.0  # seconds
    error_rate_threshold: 0.05   # 5%
    budget_warning_threshold: 0.8  # 80% of budget
</code></pre></div></pre>
<h3 id="custom-provider-configuration">Custom Provider Configuration</h3>
<p>To configure additional inference providers, add a <code node="[object Object]">providers.yaml</code> file:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml">providers:
  - name: azure-openai
    type: openai-compatible
    base_url: https://your-deployment.openai.azure.com
    api_key_env: AZURE_OPENAI_API_KEY
    models:
      - id: gpt-4
        deployment_id: your-gpt4-deployment
      - id: gpt-35-turbo
        deployment_id: your-gpt35-deployment
        
  - name: local-inference
    type: ollama-compatible
    base_url: http://localhost:8080
    models:
      - id: local-model
        capabilities: ["general"]
</code></pre></div></pre>
<h2 id="model-selection">Model Selection</h2>
<h3 id="model-tiers">Model Tiers</h3>
<p>MCP uses a tiered approach to model selection:</p>





























<table><thead><tr><th>Tier</th><th>OpenAI Models</th><th>Ollama Models</th><th>Use Cases</th></tr></thead><tbody><tr><td>High</td><td>gpt-4o, gpt-4</td><td>llama2:70b, codellama:34b</td><td>Complex reasoning, creative tasks, code generation</td></tr><tr><td>Medium</td><td>gpt-3.5-turbo</td><td>mistral, codellama</td><td>General purpose, standard code tasks</td></tr><tr><td>Low</td><td>gpt-3.5-turbo</td><td>llama2, phi</td><td>Simple queries, development testing</td></tr></tbody></table>
<h3 id="task-specific-model-mapping">Task-Specific Model Mapping</h3>
<p>MCP maps specific task types to appropriate models:</p>









































<table><thead><tr><th>Task Type</th><th>High Tier</th><th>Medium Tier</th><th>Low Tier</th></tr></thead><tbody><tr><td>Code Generation</td><td>gpt-4o</td><td>codellama</td><td>codellama</td></tr><tr><td>Creative Writing</td><td>gpt-4o</td><td>mistral</td><td>mistral</td></tr><tr><td>Mathematical</td><td>gpt-4o</td><td>gpt-3.5-turbo</td><td>wizard-math</td></tr><tr><td>General Knowledge</td><td>gpt-3.5-turbo</td><td>mistral</td><td>llama2</td></tr><tr><td>Summarization</td><td>gpt-3.5-turbo</td><td>mistral</td><td>llama2</td></tr></tbody></table>
<p>To override the automatic model selection, specify the model explicitly in your request:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "model": "openai:gpt-4o"  // Force OpenAI GPT-4o
}
</code></pre></div></pre>
<p>Or:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "model": "ollama:mistral"  // Force Ollama Mistral
}
</code></pre></div></pre>
<hr/>
<h1 id="usage-examples">Usage Examples</h1>
<h2 id="basic-chat-interaction">Basic Chat Interaction</h2>
<h3 id="python-example">Python Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python">import requests
import json

API_URL = "http://localhost:8000/api/v1"
API_KEY = "your_api_key_here"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

# Basic chat completion
def chat(message, history=None):
    history = history or []
    history.append({"role": "user", "content": message})
    
    response = requests.post(
        f"{API_URL}/chat/completions",
        headers=headers,
        json={
            "messages": history,
            "model": "auto",  # Let the system decide
            "temperature": 0.7
        }
    )
    
    if response.status_code == 200:
        result = response.json()
        assistant_message = result["message"]["content"]
        history.append({"role": "assistant", "content": assistant_message})
        
        print(f"Model used: {result['model']} via {result['provider']}")
        return assistant_message, history
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None, history

# Example conversation
history = []
response, history = chat("Hello! What can you tell me about artificial intelligence?", history)
print(f"Assistant: {response}\n")

response, history = chat("What are some practical applications?", history)
print(f"Assistant: {response}")
</code></pre></div></pre>
<h3 id="curl-example">cURL Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Simple completion
curl -X POST http://localhost:8000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain how photosynthesis works"}
    ],
    "model": "auto",
    "temperature": 0.7
  }'

# Streaming response
curl -X POST http://localhost:8000/api/v1/chat/streaming \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a short poem about robots"}
    ],
    "model": "auto",
    "stream": true
  }'
</code></pre></div></pre>
<h2 id="working-with-agents">Working with Agents</h2>
<h3 id="python-example-1">Python Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python">import requests
import json
import time

API_URL = "http://localhost:8000/api/v1"
API_KEY = "your_api_key_here"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

# Run an agent with tools
def run_research_agent(query):
    # Define agent configuration with tools
    agent_config = {
        "instructions": "You are a research assistant specialized in finding information.",
        "model": "gpt-4o",
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "search_web",
                    "description": "Search the web for information",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "Search query"
                            },
                            "num_results": {
                                "type": "integer",
                                "description": "Number of results to return"
                            }
                        },
                        "required": ["query"]
                    }
                }
            }
        ]
    }
    
    # Run the agent
    response = requests.post(
        f"{API_URL}/agents/run",
        headers=headers,
        json={
            "agent_config": agent_config,
            "messages": [
                {"role": "user", "content": query}
            ]
        }
    )
    
    if response.status_code != 200:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None
    
    result = response.json()
    run_id = result["run_id"]
    
    # Poll for completion
    while True:
        status_response = requests.get(
            f"{API_URL}/agents/status/{run_id}",
            headers=headers
        )
        
        if status_response.status_code != 200:
            print(f"Error checking status: {status_response.status_code}")
            return None
        
        status_data = status_response.json()
        
        if status_data["status"] == "completed":
            return status_data["result"]["output"]
        elif status_data["status"] == "failed":
            print(f"Agent run failed: {status_data.get('error')}")
            return None
        
        time.sleep(1)  # Poll every second

# Example usage
result = run_research_agent("What are the latest advancements in fusion energy?")
print(result)
</code></pre></div></pre>
<h3 id="curl-example-1">cURL Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Run an agent
curl -X POST http://localhost:8000/api/v1/agents/run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "agent_config": {
      "instructions": "You are a coding assistant.",
      "model": "gpt-4o",
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "generate_code",
            "description": "Generate code in a specific language",
            "parameters": {
              "type": "object",
              "properties": {
                "language": {
                  "type": "string",
                  "description": "Programming language"
                },
                "task": {
                  "type": "string",
                  "description": "Task description"
                }
              },
              "required": ["language", "task"]
            }
          }
        }
      ]
    },
    "messages": [
      {"role": "user", "content": "Write a Python function to detect palindromes"}
    ]
  }'

# Check status
curl -X GET http://localhost:8000/api/v1/agents/status/run_abc123 \
  -H "Authorization: Bearer your_api_key_here"
</code></pre></div></pre>
<h2 id="customizing-model-selection">Customizing Model Selection</h2>
<h3 id="python-example-2">Python Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python">import requests

API_URL = "http://localhost:8000/api/v1"
API_KEY = "your_api_key_here"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

# Custom routing preferences
def custom_routing_chat(message, routing_preferences):
    response = requests.post(
        f"{API_URL}/chat/completions",
        headers=headers,
        json={
            "messages": [
                {"role": "user", "content": message}
            ],
            "routing_preferences": routing_preferences
        }
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Provider: {result['provider']}, Model: {result['model']}")
        return result["message"]["content"]
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Examples with different routing preferences
response = custom_routing_chat(
    "What is the capital of France?",
    {
        "force_provider": "ollama",  # Force Ollama
        "privacy_level": "standard",
        "latency_preference": "balanced"
    }
)
print(f"Response: {response}\n")

response = custom_routing_chat(
    "Analyze the philosophical implications of artificial general intelligence.",
    {
        "force_provider": "openai",  # Force OpenAI
        "privacy_level": "standard",
        "latency_preference": "quality"  # Prefer quality over speed
    }
)
print(f"Response: {response}\n")

response = custom_routing_chat(
    "What is my personal password?",
    {
        "force_provider": None,  # Auto-select
        "privacy_level": "high",  # Privacy-sensitive query
        "latency_preference": "balanced"
    }
)
print(f"Response: {response}")
</code></pre></div></pre>
<h3 id="curl-example-2">cURL Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Force Ollama for this request
curl -X POST http://localhost:8000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of Sweden?"}
    ],
    "routing_preferences": {
      "force_provider": "ollama",
      "privacy_level": "standard",
      "latency_preference": "speed"
    }
  }'

# Force specific model
curl -X POST http://localhost:8000/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write Python code to implement merge sort"}
    ],
    "model": "ollama:codellama"
  }'
</code></pre></div></pre>
<h2 id="tool-integration">Tool Integration</h2>
<h3 id="python-example-3">Python Example</h3>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Python</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-python">import requests

API_URL = "http://localhost:8000/api/v1"
API_KEY = "your_api_key_here"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

# Chat with tool integration
def chat_with_tools(message, tools):
    response = requests.post(
        f"{API_URL}/chat/completions",
        headers=headers,
        json={
            "messages": [
                {"role": "user", "content": message}
            ],
            "tools": tools
        }
    )
    
    if response.status_code != 200:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None
    
    result = response.json()
    
    # Check if the model wants to call a tool
    if "tool_calls" in result["message"] and result["message"]["tool_calls"]:
        tool_calls = result["message"]["tool_calls"]
        print(f"Tool calls requested: {len(tool_calls)}")
        
        # Process each tool call
        for tool_call in tool_calls:
            # In a real implementation, you would execute the actual tool here
            # For this example, we'll just simulate it
            function_name = tool_call["function"]["name"]
            arguments = json.loads(tool_call["function"]["arguments"])
            
            print(f"Executing tool: {function_name}")
            print(f"Arguments: {arguments}")
            
            # Simulate tool execution
            if function_name == "get_weather":
                tool_result = f"Weather in {arguments['location']}: Sunny, 22°C"
            elif function_name == "search_database":
                tool_result = f"Database results for {arguments['query']}: 3 records found"
            else:
                tool_result = "Unknown tool"
            
            # Send the tool result back
            response = requests.post(
                f"{API_URL}/chat/completions",
                headers=headers,
                json={
                    "messages": [
                        {"role": "user", "content": message},
                        {
                            "role": "assistant",
                            "content": result["message"]["content"],
                            "tool_calls": result["message"]["tool_calls"]
                        },
                        {
                            "role": "tool",
                            "tool_call_id": tool_call["id"],
                            "content": tool_result
                        }
                    ]
                }
            )
            
            if response.status_code == 200:
                final_result = response.json()
                return final_result["message"]["content"]
            else:
                print(f"Error in tool response: {response.status_code}")
                return None
    
    # If no tool calls, return the direct response
    return result["message"]["content"]

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search a database for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Maximum number of results"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# Example usage
response = chat_with_tools("What's the weather like in Paris?", tools)
print(f"Final response: {response}")
</code></pre></div></pre>
<hr/>
<h1 id="troubleshooting">Troubleshooting</h1>
<h2 id="common-issues">Common Issues</h2>
<h3 id="installation-issues">Installation Issues</h3>
<h4 id="ollama-installation-fails">Ollama Installation Fails</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>Error messages during Ollama installation</li>
<li><code node="[object Object]">ollama serve</code> command not found</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Check system requirements (minimum 8GB RAM recommended)</li>
<li>For Linux, ensure you have the required dependencies:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">sudo apt-get update
sudo apt-get install -y ca-certificates curl
</code></pre></div></pre>
</li>
<li>Try the manual installation from <a href="https://ollama.ai/download">ollama.ai</a></li>
<li>Check if Ollama is running:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ps aux | grep ollama
</code></pre></div></pre>
</li>
</ol>
<h4 id="python-dependency-errors">Python Dependency Errors</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li><code node="[object Object]">pip install</code> fails with compatibility errors</li>
<li>Import errors when starting the application</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Ensure you're using Python 3.11 or higher:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">python --version
</code></pre></div></pre>
</li>
<li>Try creating a fresh virtual environment:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">rm -rf venv
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
</code></pre></div></pre>
</li>
<li>Install dependencies one by one to identify problematic packages:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">pip install -r requirements.txt --no-deps
</code></pre></div></pre>
</li>
<li>Check for conflicts with pip:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">pip check
</code></pre></div></pre>
</li>
</ol>
<h3 id="api-connection-issues">API Connection Issues</h3>
<h4 id="openai-api-key-invalid">OpenAI API Key Invalid</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>Error messages about authentication</li>
<li>"Invalid API key" errors</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Verify your API key is correct and active in the OpenAI dashboard</li>
<li>Check if the key is properly set in your <code node="[object Object]">.env</code> file or environment variables</li>
<li>Ensure there are no spaces or unexpected characters in the key</li>
<li>Test the key with a simple OpenAI API request:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
</code></pre></div></pre>
</li>
</ol>
<h4 id="ollama-connection-failed">Ollama Connection Failed</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>"Connection refused" errors when connecting to Ollama</li>
<li>API requests to Ollama timeout</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Verify Ollama is running:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ollama list  # Should show available models
</code></pre></div></pre>
</li>
<li>If not running, start the Ollama service:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ollama serve
</code></pre></div></pre>
</li>
<li>Check if the Ollama port is accessible:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">curl http://localhost:11434/api/tags
</code></pre></div></pre>
</li>
<li>Verify your <code node="[object Object]">OLLAMA_HOST</code> setting in the configuration</li>
<li>If using Docker, ensure proper network configuration between containers</li>
</ol>
<h3 id="performance-issues">Performance Issues</h3>
<h4 id="high-latency-with-ollama">High Latency with Ollama</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>Very slow responses from Ollama models</li>
<li>Timeouts during inference</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Check if you have GPU support enabled:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">nvidia-smi  # Should show GPU usage
</code></pre></div></pre>
</li>
<li>Try a smaller model:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ollama pull tinyllama
</code></pre></div></pre>
</li>
<li>Adjust model parameters in your request:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "model": "ollama:llama2",
  "max_tokens": 512,
  "temperature": 0.7
}
</code></pre></div></pre>
</li>
<li>Check system resource usage:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">htop
</code></pre></div></pre>
</li>
<li>Increase the timeout in your configuration</li>
</ol>
<h4 id="memory-usage-too-high">Memory Usage Too High</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>Out of memory errors</li>
<li>System becomes unresponsive</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Use smaller models (e.g., <code node="[object Object]">mistral:7b</code> instead of larger variants)</li>
<li>Reduce batch sizes in configuration</li>
<li>Implement memory limits:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># In docker-compose.yml
services:
  ollama:
    deploy:
      resources:
        limits:
          memory: 12G
</code></pre></div></pre>
</li>
<li>Enable context window optimization:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">ENABLE_TOKEN_OPTIMIZATION=true
</code></pre>
</li>
</ol>
<h3 id="routing-and-model-issues">Routing and Model Issues</h3>
<h4 id="all-requests-going-to-one-provider">All Requests Going to One Provider</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>All requests route to OpenAI despite configuration</li>
<li>All requests route to Ollama regardless of complexity</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>Check for environment variables forcing a provider:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">FORCE_OLLAMA=false
FORCE_OPENAI=false
</code></pre>
</li>
<li>Verify complexity threshold setting:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">COMPLEXITY_THRESHOLD=0.65
</code></pre>
</li>
<li>Review routing preferences in requests:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">JSON</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-json">{
  "routing_preferences": {
    "force_provider": null
  }
}
</code></pre></div></pre>
</li>
<li>Check logs for routing decisions</li>
</ol>
<h4 id="model-not-found">Model Not Found</h4>
<p><strong>Symptoms:</strong></p>
<ul>
<li>"Model not found" errors</li>
<li>Models available but not being used</li>
</ul>
<p><strong>Possible Solutions:</strong></p>
<ol>
<li>List available models:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ollama list
</code></pre></div></pre>
</li>
<li>Pull the missing model:
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">ollama pull mistral
</code></pre></div></pre>
</li>
<li>Verify model names match exactly what you're requesting</li>
<li>Check model mapping in configuration</li>
</ol>
<h2 id="diagnostics">Diagnostics</h2>
<h3 id="log-analysis">Log Analysis</h3>
<p>MCP logs contain valuable diagnostic information. Use the following commands to analyze logs:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># View API logs
docker-compose logs -f app

# View Ollama logs
docker-compose logs -f ollama

# Search for errors
docker-compose logs | grep -i error

# Check routing decisions
docker-compose logs app | grep "Routing decision"
</code></pre></div></pre>
<h3 id="health-check-1">Health Check</h3>
<p>Use the health check endpoint to verify system status:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">curl http://localhost:8000/api/v1/health

# For more detailed health information
curl http://localhost:8000/api/v1/health/details
</code></pre></div></pre>
<h3 id="debug-mode">Debug Mode</h3>
<p>Enable debug logging for more detailed information:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Set environment variable
export LOG_LEVEL=DEBUG

# Or modify in .env file
LOG_LEVEL=DEBUG
</code></pre></div></pre>
<h3 id="performance-testing">Performance Testing</h3>
<p>Use the built-in benchmark tool to test system performance:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">python scripts/benchmark.py --provider both --queries 10 --complexity mixed
</code></pre></div></pre>
<h2 id="log-management">Log Management</h2>
<h3 id="log-levels">Log Levels</h3>
<p>MCP uses the following log levels:</p>
<ul>
<li><code node="[object Object]">ERROR</code>: Critical errors that require immediate attention</li>
<li><code node="[object Object]">WARNING</code>: Non-critical issues that might indicate problems</li>
<li><code node="[object Object]">INFO</code>: General operational information</li>
<li><code node="[object Object]">DEBUG</code>: Detailed information for debugging purposes</li>
</ul>
<h3 id="log-formats">Log Formats</h3>
<p>Logs can be formatted as text or JSON:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Set JSON logging
export LOG_FORMAT=json

# Set text logging (default)
export LOG_FORMAT=text
</code></pre></div></pre>
<h3 id="external-log-management">External Log Management</h3>
<p>For production environments, consider forwarding logs to an external system:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Using Fluentd
docker-compose -f docker-compose.yml -f docker-compose.logging.yml up -d
</code></pre></div></pre>
<p>Or configure log drivers in Docker:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">YAML</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-yaml"># In docker-compose.yml
services:
  app:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
</code></pre></div></pre>
<hr/>
<h1 id="contributing-1">Contributing</h1>
<p>Contributions to the MCP system are welcome! Please follow these guidelines:</p>
<h2 id="getting-started">Getting Started</h2>
<ol>
<li>
<p><strong>Fork the Repository</strong></p>
<p>Fork the repository on GitHub and clone your fork locally:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">git clone https://github.com/YOUR-USERNAME/mcp.git
cd mcp
</code></pre></div></pre>
</li>
<li>
<p><strong>Set Up Development Environment</strong></p>
<p>Follow the installation instructions in the <a href="#installation-guide">Installation Guide</a> section.</p>
</li>
<li>
<p><strong>Create a Branch</strong></p>
<p>Create a branch for your feature or bugfix:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">git checkout -b feature/your-feature-name
# or
git checkout -b fix/your-bugfix-name
</code></pre></div></pre>
</li>
</ol>
<h2 id="development-guidelines">Development Guidelines</h2>
<h3 id="code-style">Code Style</h3>
<ul>
<li>Follow PEP 8 style guidelines for Python code</li>
<li>Use type hints for all function definitions</li>
<li>Format code with Black</li>
<li>Verify style with flake8</li>
</ul>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Install development tools
pip install black flake8 mypy

# Format code
black app tests

# Check style
flake8 app tests

# Run type checking
mypy app
</code></pre></div></pre>
<h3 id="testing">Testing</h3>
<ul>
<li>Write unit tests for all new functionality</li>
<li>Ensure existing tests pass before submitting a PR</li>
<li>Maintain or improve code coverage</li>
</ul>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash"># Run tests
pytest

# Run tests with coverage
pytest --cov=app tests/

# Run only unit tests
pytest tests/unit/

# Run integration tests
pytest tests/integration/
</code></pre></div></pre>
<h3 id="documentation-1">Documentation</h3>
<ul>
<li>Update documentation for any new features or changes</li>
<li>Document all public APIs with docstrings</li>
<li>Keep the README and guides up to date</li>
</ul>
<h2 id="submitting-changes">Submitting Changes</h2>
<ol>
<li>
<p><strong>Commit Your Changes</strong></p>
<p>Make focused commits with meaningful commit messages:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">git add .
git commit -m "Add feature: detailed description of changes"
</code></pre></div></pre>
</li>
<li>
<p><strong>Pull Latest Changes</strong></p>
<p>Rebase your branch on the latest main:</p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">git checkout main
git pull upstream main
git checkout your-branch
git rebase main
</code></pre></div></pre>
</li>
<li>
<p><strong>Push to Your Fork</strong></p>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><div class="relative group my-6"><div class="absolute top-0 left-0 right-0 flex items-center justify-between px-4 py-2 bg-zinc-800/80 dark:bg-zinc-900/80 rounded-t-lg border-b border-border/50 z-10"><span class="text-xs font-medium text-zinc-400 uppercase tracking-wider">Bash</span><span class="flex-1"></span><button data-slot="button" data-variant="ghost" data-size="icon-xs" class="inline-flex items-center justify-center gap-2 whitespace-nowrap text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive hover:text-accent-foreground dark:hover:bg-accent/50 size-6 rounded-md [&_svg:not([class*='size-'])]:size-3 h-7 w-7 hover:bg-zinc-700/50" aria-label="Copy code"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-copy h-3.5 w-3.5 text-zinc-400" aria-hidden="true"><rect width="14" height="14" x="8" y="8" rx="2" ry="2"></rect><path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"></path></svg></button></div><pre class="block pt-12 pb-4 px-4 bg-zinc-950 dark:bg-zinc-900 rounded-lg overflow-x-auto"><code class="language-bash">git push origin your-branch
</code></pre></div></pre>
</li>
<li>
<p><strong>Create a Pull Request</strong></p>
<p>Open a pull request from your fork to the main repository:</p>
<ul>
<li>Provide a clear title and description</li>
<li>Reference any related issues</li>
<li>Describe testing performed</li>
<li>Include screenshots for UI changes</li>
</ul>
</li>
</ol>
<h2 id="code-of-conduct">Code of Conduct</h2>
<ul>
<li>Be respectful and inclusive in all interactions</li>
<li>Provide constructive feedback</li>
<li>Focus on the issues, not the people</li>
<li>Welcome contributors of all backgrounds and experience levels</li>
</ul>
<h2 id="license-1">License</h2>
<p>By contributing to this project, you agree that your contributions will be licensed under the project's MIT License.</p>
<hr/>
<h1 id="license-2">License</h1>
<h2 id="mit-license">MIT License</h2>
<pre class="bg-secondary border border-border rounded-lg p-4 overflow-x-auto"><code node="[object Object]">Copyright (c) 2023 MCP Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
</code></pre>
<h2 id="third-party-licenses">Third-Party Licenses</h2>
<p>This project incorporates several third-party open-source libraries, each with its own license:</p>
<ul>
<li><strong>FastAPI</strong>: MIT License</li>
<li><strong>Pydantic</strong>: MIT License</li>
<li><strong>Uvicorn</strong>: BSD 3-Clause License</li>
<li><strong>OpenAI Python</strong>: MIT License</li>
<li><strong>Redis-py</strong>: MIT License</li>
<li><strong>Prometheus Client</strong>: Apache License 2.0</li>
<li><strong>Ollama</strong>: MIT License</li>
</ul>
<p>Full license texts are included in the LICENSE-3RD-PARTY file in the repository.</p>
<h2 id="usage-restrictions">Usage Restrictions</h2>
<p>While the MCP system itself is open source, usage of the OpenAI API is subject to OpenAI's terms of service and usage policies. Please ensure your use of the API complies with these terms.</p></div><section class="mt-16 pt-8 border-t border-border"><h2 class="text-2xl font-bold mb-6 text-foreground flex items-center gap-2"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-book-open w-5 h-5" aria-hidden="true"><path d="M12 7v14"></path><path d="M3 18a1 1 0 0 1-1-1V4a1 1 0 0 1 1-1h5a4 4 0 0 1 4 4 4 4 0 0 1 4-4h5a1 1 0 0 1 1 1v13a1 1 0 0 1-1 1h-6a3 3 0 0 0-3 3 3 3 0 0 0-3-3z"></path></svg>Related Articles</h2><div class="flex items-center justify-center py-8 text-muted-foreground"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-loader-circle w-5 h-5 mr-2 animate-spin" aria-hidden="true"><path d="M21 12a9 9 0 1 1-6.219-8.56"></path></svg>Loading related articles...</div></section></div><aside class="hidden lg:block"><div class="sticky top-24"><h3 class="text-sm font-semibold text-foreground mb-4 uppercase tracking-wider">Table of Contents</h3><nav class="space-y-2"><a href="#architectural-synthesis-integrating-openais-agents-sdk-with-ollama" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Architectural Synthesis: Integrating OpenAI's Agents SDK with Ollama</a><a href="#a-convergence-of-contemporary-ai-paradigms" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">A Convergence of Contemporary AI Paradigms</a><a href="#theoretical-framework-and-architectural-considerations" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Theoretical Framework and Architectural Considerations</a><a href="#functional-capabilities-and-implementation-vectors" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Functional Capabilities and Implementation Vectors</a><a href="#implementation-methodology" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Implementation Methodology</a><a href="#theoretical-implications-and-future-directions" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Theoretical Implications and Future Directions</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#technical-infrastructure-establishing-the-development-environment-for-openai-ollama-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Technical Infrastructure: Establishing the Development Environment for OpenAI-Ollama Integration</a><a href="#foundational-dependencies-and-technological-requisites" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Foundational Dependencies and Technological Requisites</a><a href="#core-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Core Dependencies</a><a href="#python-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Python Environment</a><a href="#essential-python-packages" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Essential Python Packages</a><a href="#external-services" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">External Services</a><a href="#environment-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Environment Configuration</a><a href="#installation-procedure" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Installation Procedure</a><a href="#environment-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Environment Configuration</a><a href="#openai-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">OpenAI Configuration</a><a href="#model-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Model Configuration</a><a href="#system-behavior" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">System Behavior</a><a href="#routing-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Routing Configuration</a><a href="#logging-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Logging Configuration</a><a href="#development-environment-setup" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Development Environment Setup</a><a href="#repository-initialization" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Repository Initialization</a><a href="#project-structure-implementation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Project Structure Implementation</a><a href="#local-development-server" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Local Development Server</a><a href="#start-ollama-service" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Start Ollama service</a><a href="#in-a-separate-terminal-start-the-application" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">In a separate terminal, start the application</a><a href="#containerization-optional" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Containerization (Optional)</a><a href="#dockerfile" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Dockerfile</a><a href="#docker-composeyml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">docker-compose.yml</a><a href="#verification-of-installation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Verification of Installation</a><a href="#integration-architecture-openai-responses-api-within-the-mcp-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Integration Architecture: OpenAI Responses API within the MCP Framework</a><a href="#theoretical-framework-for-api-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Theoretical Framework for API Integration</a><a href="#api-architectural-design" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">API Architectural Design</a><a href="#core-endpoints-structure" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Core Endpoints Structure</a><a href="#requestresponse-schemata" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Request/Response Schemata</a><a href="#authentication-security-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Authentication & Security Framework</a><a href="#authentication-mechanisms" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Authentication Mechanisms</a><a href="#security-considerations" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Security Considerations</a><a href="#error-handling-architecture" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Error Handling Architecture</a><a href="#error-categories" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Error Categories</a><a href="#rate-limiting-architecture" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Rate Limiting Architecture</a><a href="#tiered-rate-limiting" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Tiered Rate Limiting</a><a href="#dynamic-rate-adjustment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Dynamic Rate Adjustment</a><a href="#rate-limit-response" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Rate Limit Response</a><a href="#implementation-strategy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Implementation Strategy</a><a href="#provider-abstraction-layer" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Provider Abstraction Layer</a><a href="#pseudocode-for-the-provider-abstraction-layer" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Pseudocode for the Provider Abstraction Layer</a><a href="#intelligent-routing-decision-engine" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Intelligent Routing Decision Engine</a><a href="#pseudocode-for-routing-logic" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Pseudocode for Routing Logic</a><a href="#authentication-implementation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Authentication Implementation</a><a href="#middleware-for-api-key-authentication" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Middleware for API Key Authentication</a><a href="#rate-limiting-implementation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Rate Limiting Implementation</a><a href="#rate-limiter-implementation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Rate Limiter Implementation</a><a href="#operational-considerations" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Operational Considerations</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#autonomous-agent-architecture-python-implementations-for-mcp-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Autonomous Agent Architecture: Python Implementations for MCP Integration</a><a href="#theoretical-framework-for-agent-design" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Theoretical Framework for Agent Design</a><a href="#core-agent-infrastructure" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Core Agent Infrastructure</a><a href="#base-agent-class" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Base Agent Class</a><a href="#appagentsbaseagentpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/base_agent.py</a><a href="#specialized-agent-implementations" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Specialized Agent Implementations</a><a href="#research-agent-with-knowledge-retrieval" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Research Agent with Knowledge Retrieval</a><a href="#appagentsresearchagentpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/research_agent.py</a><a href="#conversational-flow-manager-agent" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Conversational Flow Manager Agent</a><a href="#appagentsconversationmanagerpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/conversation_manager.py</a><a href="#memory-enhanced-contextual-agent" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Memory-Enhanced Contextual Agent</a><a href="#appagentscontextualagentpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/contextual_agent.py</a><a href="#advanced-tool-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Advanced Tool Integration</a><a href="#collaborative-task-management-agent" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Collaborative Task Management Agent</a><a href="#appagentstaskagentpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/task_agent.py</a><a href="#agent-factory-and-orchestration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Agent Factory and Orchestration</a><a href="#appagentsagentfactorypy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/agent_factory.py</a><a href="#metaframework-for-agent-composition" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Metaframework for Agent Composition</a><a href="#appagentsmetaagentpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/meta_agent.py</a><a href="#sample-agent-usage-implementation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Sample Agent Usage Implementation</a><a href="#appmainpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/main.py</a><a href="#configure-logging" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Configure logging</a><a href="#initialize-services" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize services</a><a href="#initialize-agent-factory" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize agent factory</a><a href="#agent-session-storage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Agent session storage</a><a href="#define-requestresponse-models" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Define request/response models</a><a href="#auth-dependency" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Auth dependency</a><a href="#routes" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Routes</a><a href="#startup-event" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Startup event</a><a href="#shutdown-event" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Shutdown event</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#hybrid-intelligence-architecture-integrating-ollama-with-openais-agent-sdk" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Hybrid Intelligence Architecture: Integrating Ollama with OpenAI's Agent SDK</a><a href="#theoretical-framework-for-hybrid-model-inference" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Theoretical Framework for Hybrid Model Inference</a><a href="#ollama-integration-architecture" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Ollama Integration Architecture</a><a href="#core-integration-components" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Core Integration Components</a><a href="#appservicesollamaservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/ollama_service.py</a><a href="#provider-selection-service" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Provider Selection Service</a><a href="#appservicesproviderservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/provider_service.py</a><a href="#configuration-settings" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Configuration Settings</a><a href="#appconfigpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/config.py</a><a href="#load-environment-variables-from-env-file" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Load environment variables from .env file</a><a href="#model-selection-and-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Model Selection and Configuration</a><a href="#appmodelsmodelcatalogpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/models/model_catalog.py</a><a href="#ollama-model-catalog" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Ollama model catalog</a><a href="#openai-ollama-model-mapping-for-fallback-scenarios" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">OpenAI -> Ollama model mapping for fallback scenarios</a><a href="#use-case-to-model-recommendations" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Use case to model recommendations</a><a href="#agent-adapter-for-model-selection" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Agent Adapter for Model Selection</a><a href="#appagentsadaptiveagentpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/agents/adaptive_agent.py</a><a href="#agent-controller-with-model-selection" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Agent Controller with Model Selection</a><a href="#appcontrollersagentcontrollerpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/controllers/agent_controller.py</a><a href="#agent-sessions-storage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Agent sessions storage</a><a href="#get-agent-factory-instance" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Get agent factory instance</a><a href="#dockerfile-for-local-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Dockerfile for Local Deployment</a><a href="#dockerfile" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Dockerfile</a><a href="#install-system-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install system dependencies</a><a href="#copy-requirements" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy requirements</a><a href="#copy-application-code" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy application code</a><a href="#set-up-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set up environment</a><a href="#default-command" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Default command</a><a href="#docker-compose-for-development" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Docker Compose for Development</a><a href="#docker-composeyml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">docker-compose.yml</a><a href="#model-preload-script" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Model Preload Script</a><a href="#scriptspreloadmodelspy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">scripts/preload_models.py</a><a href="#implementation-guide" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Implementation Guide</a><a href="#setting-up-ollama" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Setting up Ollama</a><a href="#application-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Application Configuration</a><a href="#model-selection-criteria" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Model Selection Criteria</a><a href="#ollama-model-selection" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Ollama Model Selection</a><a href="#performance-optimization" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Performance Optimization</a><a href="#fallback-mechanisms" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Fallback Mechanisms</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#comprehensive-testing-strategy-for-openai-ollama-hybrid-agent-system" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Comprehensive Testing Strategy for OpenAI-Ollama Hybrid Agent System</a><a href="#theoretical-framework-for-validation-methodology" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Theoretical Framework for Validation Methodology</a><a href="#strategic-testing-layers" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Strategic Testing Layers</a><a href="#1-unit-testing-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Unit Testing Framework</a><a href="#testsunittestproviderservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/unit/test_provider_service.py</a><a href="#testsunittestmodelselectionpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/unit/test_model_selection.py</a><a href="#testsunittestollamaservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/unit/test_ollama_service.py</a><a href="#testsunittesttoolintegrationpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/unit/test_tool_integration.py</a><a href="#2-integration-testing-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Integration Testing Framework</a><a href="#testsintegrationtestapiendpointspy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/integration/test_api_endpoints.py</a><a href="#testsintegrationtestagentflowspy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/integration/test_agent_flows.py</a><a href="#testsintegrationtestcrossproviderpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/integration/test_cross_provider.py</a><a href="#3-performance-testing-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Performance Testing Framework</a><a href="#testsperformancetestlatencypy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/performance/test_latency.py</a><a href="#skip-tests-if-its-ci-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Skip tests if it's CI environment</a><a href="#testsperformancetestmemoryusagepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/performance/test_memory_usage.py</a><a href="#skip-tests-if-its-ci-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Skip tests if it's CI environment</a><a href="#testsperformancetestresponsequalitypy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/performance/test_response_quality.py</a><a href="#skip-tests-if-its-ci-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Skip tests if it's CI environment</a><a href="#4-reliability-testing-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Reliability Testing Framework</a><a href="#testsreliabilitytesterrorhandlingpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/reliability/test_error_handling.py</a><a href="#testsreliabilitytestloadpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/reliability/test_load.py</a><a href="#skip-tests-if-its-ci-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Skip tests if it's CI environment</a><a href="#testsreliabilityteststabilitypy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/reliability/test_stability.py</a><a href="#skip-tests-if-its-ci-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Skip tests if it's CI environment</a><a href="#automation-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Automation Framework</a><a href="#test-orchestration-script" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Test Orchestration Script</a><a href="#scriptsruntestspy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">scripts/run_tests.py</a><a href="#cicd-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">CI/CD Configuration</a><a href="#githubworkflowstestyml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">.github/workflows/test.yml</a><a href="#comparative-benchmark-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Comparative Benchmark Framework</a><a href="#response-quality-evaluation-matrix" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Response Quality Evaluation Matrix</a><a href="#testsbenchmarksqualitymatrixpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/benchmarks/quality_matrix.py</a><a href="#test-questions-across-multiple-domains" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Test questions across multiple domains</a><a href="#latency-and-cost-efficiency-analysis" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Latency and Cost Efficiency Analysis</a><a href="#testsbenchmarksefficiencyanalysispy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/benchmarks/efficiency_analysis.py</a><a href="#test-prompts-of-different-lengths" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Test prompts of different lengths</a><a href="#tool-usage-comparison" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Tool Usage Comparison</a><a href="#testsbenchmarkstoolusagecomparisonpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">tests/benchmarks/tool_usage_comparison.py</a><a href="#test-tools-for-benchmarking" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Test tools for benchmarking</a><a href="#tool-usage-queries" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Tool usage queries</a><a href="#pytest-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Pytest Configuration</a><a href="#pytestini" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">pytest.ini</a><a href="#dont-run-performance-tests-by-default" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Don't run performance tests by default</a><a href="#configure-test-outputs" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Configure test outputs</a><a href="#add-environment-variables-for-default-runs" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Add environment variables for default runs</a><a href="#test-documentation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Test Documentation</a><a href="#testing-strategy-for-openai-ollama-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Testing Strategy for OpenAI-Ollama Integration</a><a href="#1-unit-testing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">1. Unit Testing</a><a href="#2-integration-testing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">2. Integration Testing</a><a href="#3-performance-testing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">3. Performance Testing</a><a href="#4-reliability-testing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">4. Reliability Testing</a><a href="#5-benchmark-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">5. Benchmark Framework</a><a href="#running-the-complete-test-suite" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Running the Complete Test Suite</a><a href="#cicd-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">CI/CD Integration</a><a href="#triggered-on-push-to-maindevelop-or-manually-via-workflowdispatch" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Triggered on push to main/develop or manually via workflow_dispatch</a><a href="#prerequisites" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Prerequisites</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#user-interface-design-for-hybrid-openai-ollama-mcp-system" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">User Interface Design for Hybrid OpenAI-Ollama MCP System</a><a href="#conceptual-framework-for-interface-design" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conceptual Framework for Interface Design</a><a href="#command-line-interface-cli-design" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Command Line Interface (CLI) Design</a><a href="#cli-architecture" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">CLI Architecture</a><a href="#cli-wireframes" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">CLI Wireframes</a><a href="#cli-interaction-flow" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">CLI Interaction Flow</a><a href="#cli-implementation-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">CLI Implementation Example</a><a href="#mcpclipy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">mcp_cli.py</a><a href="#initialize-colorama-for-cross-platform-color-support" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize colorama for cross-platform color support</a><a href="#web-interface-design" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Web Interface Design</a><a href="#web-interface-architecture" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Web Interface Architecture</a><a href="#web-interface-wireframes" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Web Interface Wireframes</a><a href="#web-interface-interaction-flow" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Web Interface Interaction Flow</a><a href="#key-web-components" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Key Web Components</a><a href="#user-interaction-flows" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">User Interaction Flows</a><a href="#new-user-onboarding-flow" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">New User Onboarding Flow</a><a href="#task-based-user-flow-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Task-Based User Flow Example</a><a href="#advanced-settings-flow" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Advanced Settings Flow</a><a href="#implementation-recommendations" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Implementation Recommendations</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System</a><a href="#strategic-optimization-framework" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Strategic Optimization Framework</a><a href="#performance-optimization-strategies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Performance Optimization Strategies</a><a href="#1-query-routing-optimization" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Query Routing Optimization</a><a href="#appservicesroutingoptimizerpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/routing_optimizer.py</a><a href="#2-response-caching-with-semantic-search" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Response Caching with Semantic Search</a><a href="#appservicescacheservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/cache_service.py</a><a href="#3-parallel-query-processing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Parallel Query Processing</a><a href="#appservicesparallelprocessorpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/parallel_processor.py</a><a href="#4-dynamic-batching-for-high-load-scenarios" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Dynamic Batching for High-Load Scenarios</a><a href="#appservicesbatchprocessorpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/batch_processor.py</a><a href="#5-model-specific-prompt-optimization" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">5. Model-Specific Prompt Optimization</a><a href="#appservicespromptoptimizerpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/prompt_optimizer.py</a><a href="#cost-reduction-strategies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Cost Reduction Strategies</a><a href="#1-token-usage-optimization" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Token Usage Optimization</a><a href="#appservicestokenoptimizerpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/token_optimizer.py</a><a href="#2-model-tier-selection" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Model Tier Selection</a><a href="#appservicesmodeltierservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/model_tier_service.py</a><a href="#3-local-model-prioritization-for-development" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Local Model Prioritization for Development</a><a href="#appservicesdevmodeservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/dev_mode_service.py</a><a href="#4-request-batching-and-rate-limiting" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Request Batching and Rate Limiting</a><a href="#appservicesratelimiterpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/rate_limiter.py</a><a href="#5-memory-and-context-compression" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">5. Memory and Context Compression</a><a href="#appservicescontextcompressionpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/context_compression.py</a><a href="#response-accuracy-optimization-strategies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Response Accuracy Optimization Strategies</a><a href="#1-prompt-engineering-templates" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Prompt Engineering Templates</a><a href="#appservicesprompttemplatespy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/prompt_templates.py</a><a href="#2-context-aware-chain-of-thought" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Context-Aware Chain of Thought</a><a href="#appserviceschainofthoughtpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/chain_of_thought.py</a><a href="#reasoning-process" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Reasoning Process</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#3-self-verification-and-error-correction" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Self-Verification and Error Correction</a><a href="#appservicesverificationservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/verification_service.py</a><a href="#4-domain-specific-knowledge-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Domain-Specific Knowledge Integration</a><a href="#appservicesdomainknowledgepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/domain_knowledge.py</a><a href="#5-dynamic-few-shot-learning" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">5. Dynamic Few-Shot Learning</a><a href="#appservicesfewshotexamplespy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/few_shot_examples.py</a><a href="#deployment-strategies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Deployment Strategies</a><a href="#local-development-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Local Development Environment</a><a href="#localsetupsh-set-up-local-development-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">local_setup.sh - Set up local development environment</a><a href="#check-for-required-tools" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check for required tools</a><a href="#create-virtual-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Create virtual environment</a><a href="#install-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install dependencies</a><a href="#set-up-environment-file" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set up environment file</a><a href="#check-if-ollama-is-installed" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check if Ollama is installed</a><a href="#pull-required-ollama-models" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Pull required Ollama models</a><a href="#start-redis-for-development" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Start Redis for development</a><a href="#initialize-database" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize database</a><a href="#run-tests-to-verify-setup" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run tests to verify setup</a><a href="#docker-composeyml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">docker-compose.yml</a><a href="#dockerfiledev" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Dockerfile.dev</a><a href="#install-system-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install system dependencies</a><a href="#install-python-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install Python dependencies</a><a href="#copy-application-code" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy application code</a><a href="#set-development-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set development environment</a><a href="#make-scripts-executable" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Make scripts executable</a><a href="#default-command" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Default command</a><a href="#appconfiglocalpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/config/local.py</a><a href="#api-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">API configuration</a><a href="#openai-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">OpenAI configuration</a><a href="#ollama-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Ollama configuration</a><a href="#app-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">App configuration</a><a href="#feature-flags" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Feature flags</a><a href="#development-specific-settings" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Development-specific settings</a><a href="#redis-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Redis configuration</a><a href="#production-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Production Deployment</a><a href="#kubernetesdeploymentyaml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">kubernetes/deployment.yaml</a><a href="#kuberneteshpayaml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">kubernetes/hpa.yaml</a><a href="#deploysh-production-deployment-script" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">deploy.sh - Production deployment script</a><a href="#check-required-environment-variables" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check required environment variables</a><a href="#build-and-push-docker-image" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Build and push Docker image</a><a href="#apply-kubernetes-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Apply Kubernetes configuration</a><a href="#create-namespace-if-it-doesnt-exist" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Create namespace if it doesn't exist</a><a href="#apply-secrets" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Apply secrets</a><a href="#deploy-redis-if-needed" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Deploy Redis if needed</a><a href="#deploy-application" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Deploy application</a><a href="#replace-variables-in-deployment-file" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Replace variables in deployment file</a><a href="#apply-hpa" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Apply HPA</a><a href="#verify-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Verify deployment</a><a href="#initialize-ollama-models-if-needed" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize Ollama models if needed</a><a href="#dockerfileprod" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Dockerfile.prod</a><a href="#install-build-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install build dependencies</a><a href="#install-python-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install Python dependencies</a><a href="#final-stage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Final stage</a><a href="#copy-wheels-from-builder-stage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy wheels from builder stage</a><a href="#copy-application-code" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy application code</a><a href="#create-non-root-user" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Create non-root user</a><a href="#set-production-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set production environment</a><a href="#expose-port" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Expose port</a><a href="#run-using-gunicorn-in-production" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run using Gunicorn in production</a><a href="#appconfiggunicornpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/config/gunicorn.py</a><a href="#bind-to-00008000" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Bind to 0.0.0.0:8000</a><a href="#worker-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Worker configuration</a><a href="#logging" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Logging</a><a href="#security" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Security</a><a href="#process-naming" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Process naming</a><a href="#cloud-deployment-aws" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Cloud Deployment (AWS)</a><a href="#awscloudformationyaml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">aws/cloudformation.yaml</a><a href="#awsdeploysh-aws-deployment-script" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">aws_deploy.sh - AWS deployment script</a><a href="#check-required-aws-cli" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check required AWS CLI</a><a href="#aws-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">AWS configuration</a><a href="#check-if-stack-exists" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check if stack exists</a><a href="#deploy-cloudformation-stack" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Deploy CloudFormation stack</a><a href="#get-stack-outputs" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Get stack outputs</a><a href="#build-and-push-docker-image" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Build and push Docker image</a><a href="#login-to-ecr" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Login to ECR</a><a href="#build-and-push" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Build and push</a><a href="#update-ecs-service-to-force-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Update ECS service to force deployment</a><a href="#optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system-continued" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System (Continued)</a><a href="#monitoring-and-observability-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Monitoring and Observability Configuration</a><a href="#prometheus-and-grafana-setup-for-metrics" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Prometheus and Grafana Setup for Metrics</a><a href="#monitoringprometheus-configyaml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">monitoring/prometheus-config.yaml</a><a href="#grafana-dashboard-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Grafana Dashboard Configuration</a><a href="#implementing-metrics-collection-in-api" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Implementing Metrics Collection in API</a><a href="#appmiddlewaremetricspy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/middleware/metrics.py</a><a href="#initialize-metrics" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize metrics</a><a href="#scaling-strategies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Scaling Strategies</a><a href="#optimizing-ollama-scaling-for-high-loads" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Optimizing Ollama Scaling for High Loads</a><a href="#appservicesollamascalingpy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/ollama_scaling.py</a><a href="#autoscaling-configuration-for-cloud-deployments" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Autoscaling Configuration for Cloud Deployments</a><a href="#kubernetesautoscaler-configyaml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">kubernetes/autoscaler-config.yaml</a><a href="#cost-optimization-monthly-budget-tracking" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Cost Optimization - Monthly Budget Tracking</a><a href="#appservicesbudgetservicepy" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">app/services/budget_service.py</a><a href="#conclusion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Conclusion</a><a href="#mcp-modern-computational-paradigm-system" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">MCP (Modern Computational Paradigm) System</a><a href="#comprehensive-documentation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Comprehensive Documentation</a><a href="#table-of-contents" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Table of Contents</a><a href="#readmemd" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">README.md</a><a href="#mcp-modern-computational-paradigm" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">MCP - Modern Computational Paradigm</a><a href="#key-features" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Key Features</a><a href="#quick-start" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Quick Start</a><a href="#prerequisites" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Prerequisites</a><a href="#installation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Installation</a><a href="#docker-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Docker Deployment</a><a href="#documentation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Documentation</a><a href="#architecture" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Architecture</a><a href="#license" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">License</a><a href="#contributing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Contributing</a><a href="#installation-guide" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Installation Guide</a><a href="#prerequisites" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Prerequisites</a><a href="#system-requirements" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">System Requirements</a><a href="#software-requirements" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Software Requirements</a><a href="#required-api-keys" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Required API Keys</a><a href="#local-development-setup" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Local Development Setup</a><a href="#1-clone-the-repository" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Clone the Repository</a><a href="#2-set-up-virtual-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Set Up Virtual Environment</a><a href="#create-virtual-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Create virtual environment</a><a href="#activate-virtual-environment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Activate virtual environment</a><a href="#on-linuxmacos" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">On Linux/macOS:</a><a href="#on-windows" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">On Windows:</a><a href="#3-install-dependencies" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Install Dependencies</a><a href="#4-install-and-configure-ollama" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Install and Configure Ollama</a><a href="#macos-using-homebrew" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">macOS (using Homebrew)</a><a href="#linux" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Linux</a><a href="#start-ollama-service" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Start Ollama service</a><a href="#5-pull-required-models" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">5. Pull Required Models</a><a href="#pull-basic-models" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Pull basic models</a><a href="#6-set-up-environment-variables" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">6. Set Up Environment Variables</a><a href="#copy-the-example-environment-file" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy the example environment file</a><a href="#edit-the-file-with-your-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Edit the file with your configuration</a><a href="#at-minimum-set-openaiapikey" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">At minimum, set OPENAI_API_KEY</a><a href="#7-initialize-local-services" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">7. Initialize Local Services</a><a href="#start-redis-using-docker" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Start Redis using Docker</a><a href="#initialize-database-if-applicable" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Initialize database (if applicable)</a><a href="#8-start-development-server" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">8. Start Development Server</a><a href="#start-with-auto-reload-for-development" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Start with auto-reload for development</a><a href="#9-verify-installation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">9. Verify Installation</a><a href="#docker-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Docker Deployment</a><a href="#1-ensure-docker-and-docker-compose-are-installed" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Ensure Docker and Docker Compose are Installed</a><a href="#verify-installation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Verify installation</a><a href="#2-configure-environment-variables" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Configure Environment Variables</a><a href="#copy-and-edit-environment-variables" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Copy and edit environment variables</a><a href="#3-start-services-with-docker-compose" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Start Services with Docker Compose</a><a href="#build-and-start-all-services" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Build and start all services</a><a href="#view-logs" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">View logs</a><a href="#4-stopping-the-services" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Stopping the Services</a><a href="#kubernetes-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Kubernetes Deployment</a><a href="#1-prerequisites" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Prerequisites</a><a href="#2-set-up-namespace-and-secrets" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. Set Up Namespace and Secrets</a><a href="#create-namespace" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Create namespace</a><a href="#create-secrets" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Create secrets</a><a href="#3-deploy-redis-if-needed" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Deploy Redis (if needed)</a><a href="#using-helm" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Using Helm</a><a href="#4-deploy-mcp-components" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Deploy MCP Components</a><a href="#apply-kubernetes-manifests" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Apply Kubernetes manifests</a><a href="#5-set-up-autoscaling-optional" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">5. Set Up Autoscaling (Optional)</a><a href="#6-check-deployment-status" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">6. Check Deployment Status</a><a href="#aws-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">AWS Deployment</a><a href="#1-prerequisites" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">1. Prerequisites</a><a href="#2-cloudformation-deployment" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">2. CloudFormation Deployment</a><a href="#deploy-using-cloudformation-template" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Deploy using CloudFormation template</a><a href="#check-deployment-status" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check deployment status</a><a href="#3-deploy-api-image-to-ecr" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">3. Deploy API Image to ECR</a><a href="#log-in-to-ecr" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Log in to ECR</a><a href="#build-and-push-image" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Build and push image</a><a href="#4-update-ecs-service" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">4. Update ECS Service</a><a href="#force-new-deployment-to-use-the-updated-image" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Force new deployment to use the updated image</a><a href="#api-reference" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">API Reference</a><a href="#authentication" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Authentication</a><a href="#bearer-token-authentication" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Bearer Token Authentication</a><a href="#query-parameter" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Query Parameter</a><a href="#chat-endpoints" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Chat Endpoints</a><a href="#create-chat-completion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Create Chat Completion</a><a href="#stream-chat-completion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Stream Chat Completion</a><a href="#hybrid-chat" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Hybrid Chat</a><a href="#agent-endpoints" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Agent Endpoints</a><a href="#run-agent" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Run Agent</a><a href="#get-agent-status" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Get Agent Status</a><a href="#list-available-agents" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">List Available Agents</a><a href="#model-management-endpoints" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Model Management Endpoints</a><a href="#list-models" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">List Models</a><a href="#get-model-details" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Get Model Details</a><a href="#pull-ollama-model" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Pull Ollama Model</a><a href="#system-endpoints" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">System Endpoints</a><a href="#health-check" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Health Check</a><a href="#system-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">System Configuration</a><a href="#update-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Update Configuration</a><a href="#system-metrics" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">System Metrics</a><a href="#configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Configuration</a><a href="#environment-variables" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Environment Variables</a><a href="#core-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Core Configuration</a><a href="#redis-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Redis Configuration</a><a href="#routing-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Routing Configuration</a><a href="#performance-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Performance Configuration</a><a href="#cost-optimization" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Cost Optimization</a><a href="#monitoring" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Monitoring</a><a href="#advanced-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Advanced Configuration</a><a href="#configuration-file" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Configuration File</a><a href="#custom-provider-configuration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Custom Provider Configuration</a><a href="#model-selection" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Model Selection</a><a href="#model-tiers" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Model Tiers</a><a href="#task-specific-model-mapping" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Task-Specific Model Mapping</a><a href="#usage-examples" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Usage Examples</a><a href="#basic-chat-interaction" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Basic Chat Interaction</a><a href="#python-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Python Example</a><a href="#basic-chat-completion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Basic chat completion</a><a href="#example-conversation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Example conversation</a><a href="#curl-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">cURL Example</a><a href="#simple-completion" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Simple completion</a><a href="#streaming-response" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Streaming response</a><a href="#working-with-agents" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Working with Agents</a><a href="#python-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Python Example</a><a href="#run-an-agent-with-tools" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run an agent with tools</a><a href="#example-usage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Example usage</a><a href="#curl-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">cURL Example</a><a href="#run-an-agent" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run an agent</a><a href="#check-status" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check status</a><a href="#customizing-model-selection" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Customizing Model Selection</a><a href="#python-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Python Example</a><a href="#custom-routing-preferences" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Custom routing preferences</a><a href="#examples-with-different-routing-preferences" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Examples with different routing preferences</a><a href="#curl-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">cURL Example</a><a href="#force-ollama-for-this-request" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Force Ollama for this request</a><a href="#force-specific-model" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Force specific model</a><a href="#tool-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Tool Integration</a><a href="#python-example" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Python Example</a><a href="#chat-with-tool-integration" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Chat with tool integration</a><a href="#define-available-tools" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Define available tools</a><a href="#example-usage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Example usage</a><a href="#troubleshooting" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Troubleshooting</a><a href="#common-issues" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Common Issues</a><a href="#installation-issues" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Installation Issues</a><a href="#api-connection-issues" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">API Connection Issues</a><a href="#performance-issues" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Performance Issues</a><a href="#routing-and-model-issues" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Routing and Model Issues</a><a href="#diagnostics" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Diagnostics</a><a href="#log-analysis" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Log Analysis</a><a href="#view-api-logs" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">View API logs</a><a href="#view-ollama-logs" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">View Ollama logs</a><a href="#search-for-errors" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Search for errors</a><a href="#check-routing-decisions" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check routing decisions</a><a href="#health-check" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Health Check</a><a href="#for-more-detailed-health-information" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">For more detailed health information</a><a href="#debug-mode" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Debug Mode</a><a href="#set-environment-variable" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set environment variable</a><a href="#or-modify-in-env-file" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Or modify in .env file</a><a href="#performance-testing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Performance Testing</a><a href="#log-management" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Log Management</a><a href="#log-levels" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Log Levels</a><a href="#log-formats" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Log Formats</a><a href="#set-json-logging" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set JSON logging</a><a href="#set-text-logging-default" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Set text logging (default)</a><a href="#external-log-management" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">External Log Management</a><a href="#using-fluentd" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Using Fluentd</a><a href="#in-docker-composeyml" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">In docker-compose.yml</a><a href="#contributing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Contributing</a><a href="#getting-started" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Getting Started</a><a href="#development-guidelines" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Development Guidelines</a><a href="#code-style" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Code Style</a><a href="#install-development-tools" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Install development tools</a><a href="#format-code" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Format code</a><a href="#check-style" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Check style</a><a href="#run-type-checking" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run type checking</a><a href="#testing" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Testing</a><a href="#run-tests" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run tests</a><a href="#run-tests-with-coverage" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run tests with coverage</a><a href="#run-only-unit-tests" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run only unit tests</a><a href="#run-integration-tests" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">Run integration tests</a><a href="#documentation" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4">Documentation</a><a href="#submitting-changes" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Submitting Changes</a><a href="#code-of-conduct" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Code of Conduct</a><a href="#license" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">License</a><a href="#license" class="block text-sm text-muted-foreground hover:text-foreground transition-colors ">License</a><a href="#mit-license" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">MIT License</a><a href="#third-party-licenses" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Third-Party Licenses</a><a href="#usage-restrictions" class="block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0">Usage Restrictions</a></nav></div></aside></div></article><!--$--><!--/$--></main><script src="/_next/static/chunks/ca7c464890d0e8da.js" id="_R_" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0])</script><script>self.__next_f.push([1,"1:\"$Sreact.fragment\"\n2:I[2355,[\"/_next/static/chunks/a2b8d4a1cdff0364.js\",\"/_next/static/chunks/e3b299ee8c272c5f.js\",\"/_next/static/chunks/5b18bef1a05fed16.js\",\"/_next/static/chunks/8d9ce281e2699520.js\",\"/_next/static/chunks/f2bcd5b1588de598.js\"],\"Analytics\"]\n3:I[55068,[\"/_next/static/chunks/a2b8d4a1cdff0364.js\",\"/_next/static/chunks/e3b299ee8c272c5f.js\",\"/_next/static/chunks/5b18bef1a05fed16.js\",\"/_next/static/chunks/8d9ce281e2699520.js\",\"/_next/static/chunks/f2bcd5b1588de598.js\"],\"SiteNav\"]\n4:I[39756,[\"/_next/static/chunks/ff1a16fafef87110.js\",\"/_next/static/chunks/bd395c328f5600f0.js\"],\"default\"]\n5:I[37457,[\"/_next/static/chunks/ff1a16fafef87110.js\",\"/_next/static/chunks/bd395c328f5600f0.js\"],\"default\"]\n9:I[68027,[],\"default\"]\n:HL[\"/_next/static/chunks/5bd76f0f07e1e8f5.css\",\"style\"]\n:HL[\"/_next/static/chunks/7d997037a4f8c17b.css\",\"style\"]\n:HL[\"/_next/static/media/2a65768255d6b625-s.p.d19752fb.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n:HL[\"/_next/static/media/797e433ab948586e-s.p.dbea232f.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n:HL[\"/_next/static/media/caa3a2e1cccd8315-s.p.853070df.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n"])</script><script>self.__next_f.push([1,"0:{\"P\":null,\"b\":\"6uZ8qrgjcJrZScikhKOfx\",\"c\":[\"\",\"blog\",\"2025-03-12-integrating-openai-agents-sdk-ollama\"],\"q\":\"\",\"i\":false,\"f\":[[[\"\",{\"children\":[\"blog\",{\"children\":[[\"slug\",\"2025-03-12-integrating-openai-agents-sdk-ollama\",\"d\"],{\"children\":[\"__PAGE__\",{}]}]}]},\"$undefined\",\"$undefined\",true],[[\"$\",\"$1\",\"c\",{\"children\":[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/chunks/5bd76f0f07e1e8f5.css\",\"precedence\":\"next\",\"crossOrigin\":\"$undefined\",\"nonce\":\"$undefined\"}],[\"$\",\"link\",\"1\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/chunks/7d997037a4f8c17b.css\",\"precedence\":\"next\",\"crossOrigin\":\"$undefined\",\"nonce\":\"$undefined\"}],[\"$\",\"script\",\"script-0\",{\"src\":\"/_next/static/chunks/a2b8d4a1cdff0364.js\",\"async\":true,\"nonce\":\"$undefined\"}],[\"$\",\"script\",\"script-1\",{\"src\":\"/_next/static/chunks/e3b299ee8c272c5f.js\",\"async\":true,\"nonce\":\"$undefined\"}],[\"$\",\"script\",\"script-2\",{\"src\":\"/_next/static/chunks/5b18bef1a05fed16.js\",\"async\":true,\"nonce\":\"$undefined\"}],[\"$\",\"script\",\"script-3\",{\"src\":\"/_next/static/chunks/8d9ce281e2699520.js\",\"async\":true,\"nonce\":\"$undefined\"}],[\"$\",\"script\",\"script-4\",{\"src\":\"/_next/static/chunks/f2bcd5b1588de598.js\",\"async\":true,\"nonce\":\"$undefined\"}]],[\"$\",\"html\",null,{\"lang\":\"en\",\"className\":\"dark\",\"suppressHydrationWarning\":true,\"children\":[[\"$\",\"head\",null,{\"children\":[[\"$\",\"script\",null,{\"type\":\"application/ld+json\",\"dangerouslySetInnerHTML\":{\"__html\":\"[{\\\"@type\\\":\\\"WebSite\\\",\\\"name\\\":\\\"Daniel Kliewer\\\",\\\"url\\\":\\\"https://danielkliewer.com\\\",\\\"potentialAction\\\":{\\\"@type\\\":\\\"SearchAction\\\",\\\"target\\\":{\\\"@type\\\":\\\"EntryPoint\\\",\\\"urlTemplate\\\":\\\"https://danielkliewer.com/blog?q={search_term_string}\\\"},\\\"query-input\\\":\\\"required name=search_term_string\\\"}},{\\\"@type\\\":\\\"Person\\\",\\\"name\\\":\\\"Daniel Kliewer\\\",\\\"url\\\":\\\"https://danielkliewer.com\\\",\\\"jobTitle\\\":\\\"Software Engineer \u0026 AI Practitioner\\\",\\\"address\\\":{\\\"@type\\\":\\\"PostalAddress\\\",\\\"addressLocality\\\":\\\"Austin\\\",\\\"addressRegion\\\":\\\"TX\\\",\\\"addressCountry\\\":\\\"US\\\"},\\\"sameAs\\\":[\\\"https://github.com/kliewerdaniel\\\",\\\"https://linkedin.com/in/danielkliewer\\\"],\\\"knowsAbout\\\":[\\\"Artificial Intelligence\\\",\\\"Large Language Models\\\",\\\"Retrieval-Augmented Generation\\\",\\\"Autonomous Agents\\\",\\\"Next.js\\\",\\\"React\\\",\\\"Python\\\"]}]\"}}],[\"$\",\"link\",null,{\"rel\":\"canonical\",\"href\":\"https://danielkliewer.com\"}],[\"$\",\"meta\",null,{\"name\":\"apple-mobile-web-app-capable\",\"content\":\"yes\"}],[\"$\",\"meta\",null,{\"name\":\"apple-mobile-web-app-status-bar-style\",\"content\":\"black-translucent\"}]]}],[\"$\",\"body\",null,{\"className\":\"geist_a71539c9-module__T19VSG__variable geist_mono_8d43a2aa-module__8Li5zG__variable playfair_display_d5eda251-module__JGL7aG__variable antialiased min-h-screen bg-background text-foreground\",\"children\":[[\"$\",\"$L2\",null,{}],[\"$\",\"$L3\",null,{}],[\"$\",\"main\",null,{\"className\":\"min-h-screen\",\"children\":[\"$\",\"$L4\",null,{\"parallelRouterKey\":\"children\",\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L5\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":404}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],[]],\"forbidden\":\"$undefined\",\"unauthorized\":\"$undefined\"}]}]]}]]}]]}],{\"children\":[[\"$\",\"$1\",\"c\",{\"children\":[null,[\"$\",\"$L4\",null,{\"parallelRouterKey\":\"children\",\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L5\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"forbidden\":\"$undefined\",\"unauthorized\":\"$undefined\"}]]}],{\"children\":[[\"$\",\"$1\",\"c\",{\"children\":[null,[\"$\",\"$L4\",null,{\"parallelRouterKey\":\"children\",\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":\"$L6\",\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"forbidden\":\"$undefined\",\"unauthorized\":\"$undefined\"}]]}],{\"children\":[\"$L7\",{},null,false,false]},null,false,false]},null,false,false]},null,false,false],\"$L8\",false]],\"m\":\"$undefined\",\"G\":[\"$9\",[]],\"S\":true}\n"])</script><script>self.__next_f.push([1,"b:I[97367,[\"/_next/static/chunks/ff1a16fafef87110.js\",\"/_next/static/chunks/bd395c328f5600f0.js\"],\"OutletBoundary\"]\nc:\"$Sreact.suspense\"\ne:I[97367,[\"/_next/static/chunks/ff1a16fafef87110.js\",\"/_next/static/chunks/bd395c328f5600f0.js\"],\"ViewportBoundary\"]\n10:I[97367,[\"/_next/static/chunks/ff1a16fafef87110.js\",\"/_next/static/chunks/bd395c328f5600f0.js\"],\"MetadataBoundary\"]\n6:[\"$\",\"$L5\",null,{}]\n7:[\"$\",\"$1\",\"c\",{\"children\":[\"$La\",[[\"$\",\"script\",\"script-0\",{\"src\":\"/_next/static/chunks/8178183a3ca3d9dc.js\",\"async\":true,\"nonce\":\"$undefined\"}],[\"$\",\"script\",\"script-1\",{\"src\":\"/_next/static/chunks/345e895c375d62af.js\",\"async\":true,\"nonce\":\"$undefined\"}]],[\"$\",\"$Lb\",null,{\"children\":[\"$\",\"$c\",null,{\"name\":\"Next.MetadataOutlet\",\"children\":\"$@d\"}]}]]}]\n8:[\"$\",\"$1\",\"h\",{\"children\":[null,[\"$\",\"$Le\",null,{\"children\":\"$Lf\"}],[\"$\",\"div\",null,{\"hidden\":true,\"children\":[\"$\",\"$L10\",null,{\"children\":[\"$\",\"$c\",null,{\"name\":\"Next.Metadata\",\"children\":\"$L11\"}]}]}],[\"$\",\"meta\",null,{\"name\":\"next-size-adjust\",\"content\":\"\"}]]}]\n"])</script><script>self.__next_f.push([1,":HL[\"/images/ComfyUI_00211_.png\",\"image\"]\n"])</script><script>self.__next_f.push([1,"a:[\"$\",\"article\",null,{\"className\":\"container px-4 py-16 mx-auto max-w-6xl min-h-screen bg-background\",\"children\":[[\"$\",\"div\",null,{\"className\":\"lg:grid lg:grid-cols-[1fr_250px] lg:gap-12\",\"children\":[[\"$\",\"div\",null,{\"children\":[[\"$\",\"header\",null,{\"className\":\"mb-12\",\"children\":[[\"$\",\"h1\",null,{\"className\":\"text-4xl md:text-5xl font-bold mb-4 text-foreground tracking-wide\",\"style\":{\"fontFamily\":\"var(--font-playfair), Georgia, serif\"},\"children\":\"OpenAI Agents SDK \u0026 Ollama Integration: Complete Architecture Guide\"}],[\"$\",\"div\",null,{\"className\":\"flex flex-wrap items-center gap-4 text-muted-foreground mb-6\",\"children\":[[\"$\",\"time\",null,{\"dateTime\":\"2025-03-12 11:42:44 -0500\",\"children\":\"March 12, 2025\"}],\"$undefined\",[[\"$\",\"span\",null,{\"children\":\"•\"}],[\"$\",\"span\",null,{\"className\":\"flex items-center gap-1\",\"children\":[[\"$\",\"svg\",null,{\"ref\":\"$undefined\",\"xmlns\":\"http://www.w3.org/2000/svg\",\"width\":24,\"height\":24,\"viewBox\":\"0 0 24 24\",\"fill\":\"none\",\"stroke\":\"currentColor\",\"strokeWidth\":2,\"strokeLinecap\":\"round\",\"strokeLinejoin\":\"round\",\"className\":\"lucide lucide-clock w-4 h-4\",\"aria-hidden\":\"true\",\"children\":[[\"$\",\"path\",\"mmk7yg\",{\"d\":\"M12 6v6l4 2\"}],[\"$\",\"circle\",\"1mglay\",{\"cx\":\"12\",\"cy\":\"12\",\"r\":\"10\"}],\"$undefined\"]}],253,\" min read\"]}]]]}],\"$undefined\"]}],\"$undefined\",[\"$\",\"div\",null,{\"className\":\"prose prose-lg dark:prose-invert max-w-none  prose-headings:text-foreground prose-headings:font-bold prose-headings:tracking-wide prose-p:text-muted-foreground prose-p:leading-relaxed prose-a:text-foreground prose-a:underline hover:prose-a:opacity-70 prose-strong:text-foreground prose-code:text-foreground prose-code:bg-secondary prose-code:px-1 prose-code:py-0.5 prose-code:rounded prose-pre:bg-secondary prose-pre:border prose-pre:border-border prose-blockquote:border-l-foreground prose-blockquote:text-muted-foreground prose-li:text-muted-foreground prose-img:rounded-lg prose-pre:relative prose-pre:overflow-hidden \",\"children\":[[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"img\",\"img-0\",{\"src\":\"/images/ComfyUI_00211_.png\",\"alt\":\"Image\"}]}],\"\\n\",[\"$\",\"h1\",\"h1-0\",{\"id\":\"architectural-synthesis-integrating-openais-agents-sdk-with-ollama\",\"children\":\"Architectural Synthesis: Integrating OpenAI's Agents SDK with Ollama\"}],\"\\n\",[\"$\",\"h2\",\"h2-0\",{\"id\":\"a-convergence-of-contemporary-ai-paradigms\",\"children\":\"A Convergence of Contemporary AI Paradigms\"}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":\"In the evolving landscape of artificial intelligence systems, the architectural integration of OpenAI's Agents SDK with Ollama represents a sophisticated approach to creating hybrid, responsive computational entities. This synthesis enables a dialectical interaction between cloud-based intelligence and local computational resources, creating what might be conceptualized as a Modern Computational Paradigm (MCP) system.\"}],\"\\n\",[\"$\",\"h2\",\"h2-1\",{\"id\":\"theoretical-framework-and-architectural-considerations\",\"children\":\"Theoretical Framework and Architectural Considerations\"}],\"\\n\",[\"$\",\"p\",\"p-2\",{\"children\":\"The foundational architecture of this integration leverages the strengths of both paradigms: OpenAI's Agents SDK provides a structured framework for creating autonomous agents capable of orchestrating complex, multi-step reasoning processes, while Ollama offers localized execution of large language models with reduced latency and enhanced privacy guarantees.\"}],\"\\n\",[\"$\",\"p\",\"p-3\",{\"children\":\"At its epistemological core, this architecture addresses the fundamental tension between computational capability and data sovereignty. The implementation creates a fluid boundary between local and remote processing, determined by contextual parameters including:\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Computational complexity thresholds\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Privacy requirements of specific data domains\"}],\"\\n\",\"$L12\",\"\\n\",\"$L13\",\"\\n\"]}],\"\\n\",\"$L14\",\"\\n\",\"$L15\",\"\\n\",\"$L16\",\"\\n\",\"$L17\",\"\\n\",\"$L18\",\"\\n\",\"$L19\",\"\\n\",\"$L1a\",\"\\n\",\"$L1b\",\"\\n\",\"$L1c\",\"\\n\",\"$L1d\",\"\\n\",\"$L1e\",\"\\n\",\"$L1f\",\"\\n\",\"$L20\",\"\\n\",\"$L21\",\"\\n\",\"$L22\",\"\\n\",\"$L23\",\"\\n\",\"$L24\",\"\\n\",\"$L25\",\"\\n\",\"$L26\",\"\\n\",\"$L27\",\"\\n\",\"$L28\",\"\\n\",\"$L29\",\"\\n\",\"$L2a\",\"\\n\",\"$L2b\",\"\\n\",\"$L2c\",\"\\n\",\"$L2d\",\"\\n\",\"$L2e\",\"\\n\",\"$L2f\",\"\\n\",\"$L30\",\"\\n\",\"$L31\",\"\\n\",\"$L32\",\"\\n\",\"$L33\",\"\\n\",\"$L34\",\"\\n\",\"$L35\",\"\\n\",\"$L36\",\"\\n\",\"$L37\",\"\\n\",\"$L38\",\"\\n\",\"$L39\",\"\\n\",\"$L3a\",\"\\n\",\"$L3b\",\"\\n\",\"$L3c\",\"\\n\",\"$L3d\",\"\\n\",\"$L3e\",\"\\n\",\"$L3f\",\"\\n\",\"$L40\",\"\\n\",\"$L41\",\"\\n\",\"$L42\",\"\\n\",\"$L43\",\"\\n\",\"$L44\",\"\\n\",\"$L45\",\"\\n\",\"$L46\",\"\\n\",\"$L47\",\"\\n\",\"$L48\",\"\\n\",\"$L49\",\"\\n\",\"$L4a\",\"\\n\",\"$L4b\",\"\\n\",\"$L4c\",\"\\n\",\"$L4d\",\"\\n\",\"$L4e\",\"\\n\",\"$L4f\",\"\\n\",\"$L50\",\"\\n\",\"$L51\",\"\\n\",\"$L52\",\"\\n\",\"$L53\",\"\\n\",\"$L54\",\"\\n\",\"$L55\",\"\\n\",\"$L56\",\"\\n\",\"$L57\",\"\\n\",\"$L58\",\"\\n\",\"$L59\",\"\\n\",\"$L5a\",\"\\n\",\"$L5b\",\"\\n\",\"$L5c\",\"\\n\",\"$L5d\",\"\\n\",\"$L5e\",\"\\n\",\"$L5f\",\"\\n\",\"$L60\",\"\\n\",\"$L61\",\"\\n\",\"$L62\",\"\\n\",\"$L63\",\"\\n\",\"$L64\",\"\\n\",\"$L65\",\"\\n\",\"$L66\",\"\\n\",\"$L67\",\"\\n\",\"$L68\",\"\\n\",\"$L69\",\"\\n\",\"$L6a\",\"\\n\",\"$L6b\",\"\\n\",\"$L6c\",\"\\n\",\"$L6d\",\"\\n\",\"$L6e\",\"\\n\",\"$L6f\",\"\\n\",\"$L70\",\"\\n\",\"$L71\",\"\\n\",\"$L72\",\"\\n\",\"$L73\",\"\\n\",\"$L74\",\"\\n\",\"$L75\",\"\\n\",\"$L76\",\"\\n\",\"$L77\",\"\\n\",\"$L78\",\"\\n\",\"$L79\",\"\\n\",\"$L7a\",\"\\n\",\"$L7b\",\"\\n\",\"$L7c\",\"\\n\",\"$L7d\",\"\\n\",\"$L7e\",\"\\n\",\"$L7f\",\"\\n\",\"$L80\",\"\\n\",\"$L81\",\"\\n\",\"$L82\",\"\\n\",\"$L83\",\"\\n\",\"$L84\",\"\\n\",\"$L85\",\"\\n\",\"$L86\",\"\\n\",\"$L87\",\"\\n\",\"$L88\",\"\\n\",\"$L89\",\"\\n\",\"$L8a\",\"\\n\",\"$L8b\",\"\\n\",\"$L8c\",\"\\n\",\"$L8d\",\"\\n\",\"$L8e\",\"\\n\",\"$L8f\",\"\\n\",\"$L90\",\"\\n\",\"$L91\",\"\\n\",\"$L92\",\"\\n\",\"$L93\",\"\\n\",\"$L94\",\"\\n\",\"$L95\",\"\\n\",\"$L96\",\"\\n\",\"$L97\",\"\\n\",\"$L98\",\"\\n\",\"$L99\",\"\\n\",\"$L9a\",\"\\n\",\"$L9b\",\"\\n\",\"$L9c\",\"\\n\",\"$L9d\",\"\\n\",\"$L9e\",\"\\n\",\"$L9f\",\"\\n\",\"$La0\",\"\\n\",\"$La1\",\"\\n\",\"$La2\",\"\\n\",\"$La3\",\"\\n\",\"$La4\",\"\\n\",\"$La5\",\"\\n\",\"$La6\",\"\\n\",\"$La7\",\"\\n\",\"$La8\",\"\\n\",\"$La9\",\"\\n\",\"$Laa\",\"\\n\",\"$Lab\",\"\\n\",\"$Lac\",\"\\n\",\"$Lad\",\"\\n\",\"$Lae\",\"\\n\",\"$Laf\",\"\\n\",\"$Lb0\",\"\\n\",\"$Lb1\",\"\\n\",\"$Lb2\",\"\\n\",\"$Lb3\",\"\\n\",\"$Lb4\",\"\\n\",\"$Lb5\",\"\\n\",\"$Lb6\",\"\\n\",\"$Lb7\",\"\\n\",\"$Lb8\",\"\\n\",\"$Lb9\",\"\\n\",\"$Lba\",\"\\n\",\"$Lbb\",\"\\n\",\"$Lbc\",\"\\n\",\"$Lbd\",\"\\n\",\"$Lbe\",\"\\n\",\"$Lbf\",\"\\n\",\"$Lc0\",\"\\n\",\"$Lc1\",\"\\n\",\"$Lc2\",\"\\n\",\"$Lc3\",\"\\n\",\"$Lc4\",\"\\n\",\"$Lc5\",\"\\n\",\"$Lc6\",\"\\n\",\"$Lc7\",\"\\n\",\"$Lc8\",\"\\n\",\"$Lc9\",\"\\n\",\"$Lca\",\"\\n\",\"$Lcb\",\"\\n\",\"$Lcc\",\"\\n\",\"$Lcd\",\"\\n\",\"$Lce\",\"\\n\",\"$Lcf\",\"\\n\",\"$Ld0\",\"\\n\",\"$Ld1\",\"\\n\",\"$Ld2\",\"\\n\",\"$Ld3\",\"\\n\",\"$Ld4\",\"\\n\",\"$Ld5\",\"\\n\",\"$Ld6\",\"\\n\",\"$Ld7\",\"\\n\",\"$Ld8\",\"\\n\",\"$Ld9\",\"\\n\",\"$Lda\",\"\\n\",\"$Ldb\",\"\\n\",\"$Ldc\",\"\\n\",\"$Ldd\",\"\\n\",\"$Lde\",\"\\n\",\"$Ldf\",\"\\n\",\"$Le0\",\"\\n\",\"$Le1\",\"\\n\",\"$Le2\",\"\\n\",\"$Le3\",\"\\n\",\"$Le4\",\"\\n\",\"$Le5\",\"\\n\",\"$Le6\",\"\\n\",\"$Le7\",\"\\n\",\"$Le8\",\"\\n\",\"$Le9\",\"\\n\",\"$Lea\",\"\\n\",\"$Leb\",\"\\n\",\"$Lec\",\"\\n\",\"$Led\",\"\\n\",\"$Lee\",\"\\n\",\"$Lef\",\"\\n\",\"$Lf0\",\"\\n\",\"$Lf1\",\"\\n\",\"$Lf2\",\"\\n\",\"$Lf3\",\"\\n\",\"$Lf4\",\"\\n\",\"$Lf5\",\"\\n\",\"$Lf6\",\"\\n\",\"$Lf7\",\"\\n\",\"$Lf8\",\"\\n\",\"$Lf9\",\"\\n\",\"$Lfa\",\"\\n\",\"$Lfb\",\"\\n\",\"$Lfc\",\"\\n\",\"$Lfd\",\"\\n\",\"$Lfe\",\"\\n\",\"$Lff\",\"\\n\",\"$L100\",\"\\n\",\"$L101\",\"\\n\",\"$L102\",\"\\n\",\"$L103\",\"\\n\",\"$L104\",\"\\n\",\"$L105\",\"\\n\",\"$L106\",\"\\n\",\"$L107\",\"\\n\",\"$L108\",\"\\n\",\"$L109\",\"\\n\",\"$L10a\",\"\\n\",\"$L10b\",\"\\n\",\"$L10c\",\"\\n\",\"$L10d\",\"\\n\",\"$L10e\",\"\\n\",\"$L10f\",\"\\n\",\"$L110\",\"\\n\",\"$L111\",\"\\n\",\"$L112\",\"\\n\",\"$L113\",\"\\n\",\"$L114\",\"\\n\",\"$L115\",\"\\n\",\"$L116\",\"\\n\",\"$L117\",\"\\n\",\"$L118\",\"\\n\",\"$L119\",\"\\n\",\"$L11a\",\"\\n\",\"$L11b\",\"\\n\",\"$L11c\",\"\\n\",\"$L11d\",\"\\n\",\"$L11e\",\"\\n\",\"$L11f\",\"\\n\",\"$L120\",\"\\n\",\"$L121\",\"\\n\",\"$L122\",\"\\n\",\"$L123\",\"\\n\",\"$L124\",\"\\n\",\"$L125\",\"\\n\",\"$L126\",\"\\n\",\"$L127\",\"\\n\",\"$L128\",\"\\n\",\"$L129\",\"\\n\",\"$L12a\",\"\\n\",\"$L12b\",\"\\n\",\"$L12c\",\"\\n\",\"$L12d\",\"\\n\",\"$L12e\",\"\\n\",\"$L12f\",\"\\n\",\"$L130\",\"\\n\",\"$L131\",\"\\n\",\"$L132\",\"\\n\",\"$L133\",\"\\n\",\"$L134\",\"\\n\",\"$L135\",\"\\n\",\"$L136\",\"\\n\",\"$L137\",\"\\n\",\"$L138\",\"\\n\",\"$L139\",\"\\n\",\"$L13a\",\"\\n\",\"$L13b\",\"\\n\",\"$L13c\",\"\\n\",\"$L13d\",\"\\n\",\"$L13e\",\"\\n\",\"$L13f\",\"\\n\",\"$L140\",\"\\n\",\"$L141\",\"\\n\",\"$L142\",\"\\n\",\"$L143\",\"\\n\",\"$L144\",\"\\n\",\"$L145\",\"\\n\",\"$L146\",\"\\n\",\"$L147\",\"\\n\",\"$L148\",\"\\n\",\"$L149\",\"\\n\",\"$L14a\",\"\\n\",\"$L14b\",\"\\n\",\"$L14c\",\"\\n\",\"$L14d\",\"\\n\",\"$L14e\",\"\\n\",\"$L14f\",\"\\n\",\"$L150\",\"\\n\",\"$L151\",\"\\n\",\"$L152\",\"\\n\",\"$L153\",\"\\n\",\"$L154\",\"\\n\",\"$L155\",\"\\n\",\"$L156\",\"\\n\",\"$L157\",\"\\n\",\"$L158\",\"\\n\",\"$L159\",\"\\n\",\"$L15a\",\"\\n\",\"$L15b\",\"\\n\",\"$L15c\",\"\\n\",\"$L15d\",\"\\n\",\"$L15e\",\"\\n\",\"$L15f\",\"\\n\",\"$L160\",\"\\n\",\"$L161\",\"\\n\",\"$L162\",\"\\n\",\"$L163\",\"\\n\",\"$L164\",\"\\n\",\"$L165\",\"\\n\",\"$L166\",\"\\n\",\"$L167\",\"\\n\",\"$L168\",\"\\n\",\"$L169\",\"\\n\",\"$L16a\",\"\\n\",\"$L16b\",\"\\n\",\"$L16c\",\"\\n\",\"$L16d\",\"\\n\",\"$L16e\",\"\\n\",\"$L16f\",\"\\n\",\"$L170\",\"\\n\",\"$L171\",\"\\n\",\"$L172\",\"\\n\",\"$L173\",\"\\n\",\"$L174\",\"\\n\",\"$L175\",\"\\n\",\"$L176\",\"\\n\",\"$L177\",\"\\n\",\"$L178\",\"\\n\",\"$L179\",\"\\n\",\"$L17a\",\"\\n\",\"$L17b\",\"\\n\",\"$L17c\",\"\\n\",\"$L17d\",\"\\n\",\"$L17e\",\"\\n\",\"$L17f\",\"\\n\",\"$L180\",\"\\n\",\"$L181\",\"\\n\",\"$L182\",\"\\n\",\"$L183\",\"\\n\",\"$L184\",\"\\n\",\"$L185\",\"\\n\",\"$L186\",\"\\n\",\"$L187\",\"\\n\",\"$L188\",\"\\n\",\"$L189\",\"\\n\",\"$L18a\",\"\\n\",\"$L18b\",\"\\n\",\"$L18c\",\"\\n\",\"$L18d\",\"\\n\",\"$L18e\",\"\\n\",\"$L18f\",\"\\n\",\"$L190\",\"\\n\",\"$L191\",\"\\n\",\"$L192\",\"\\n\",\"$L193\",\"\\n\",\"$L194\",\"\\n\",\"$L195\",\"\\n\",\"$L196\",\"\\n\",\"$L197\",\"\\n\",\"$L198\",\"\\n\",\"$L199\",\"\\n\",\"$L19a\",\"\\n\",\"$L19b\",\"\\n\",\"$L19c\",\"\\n\",\"$L19d\",\"\\n\",\"$L19e\",\"\\n\",\"$L19f\",\"\\n\",\"$L1a0\",\"\\n\",\"$L1a1\",\"\\n\",\"$L1a2\",\"\\n\",\"$L1a3\",\"\\n\",\"$L1a4\",\"\\n\",\"$L1a5\",\"\\n\",\"$L1a6\",\"\\n\",\"$L1a7\",\"\\n\",\"$L1a8\",\"\\n\",\"$L1a9\",\"\\n\",\"$L1aa\",\"\\n\",\"$L1ab\",\"\\n\",\"$L1ac\",\"\\n\",\"$L1ad\",\"\\n\",\"$L1ae\",\"\\n\",\"$L1af\",\"\\n\",\"$L1b0\",\"\\n\",\"$L1b1\",\"\\n\",\"$L1b2\",\"\\n\",\"$L1b3\",\"\\n\",\"$L1b4\",\"\\n\",\"$L1b5\",\"\\n\",\"$L1b6\",\"\\n\",\"$L1b7\",\"\\n\",\"$L1b8\",\"\\n\",\"$L1b9\",\"\\n\",\"$L1ba\",\"\\n\",\"$L1bb\",\"\\n\",\"$L1bc\",\"\\n\",\"$L1bd\",\"\\n\",\"$L1be\",\"\\n\",\"$L1bf\",\"\\n\",\"$L1c0\",\"\\n\",\"$L1c1\",\"\\n\",\"$L1c2\",\"\\n\",\"$L1c3\",\"\\n\",\"$L1c4\",\"\\n\",\"$L1c5\",\"\\n\",\"$L1c6\",\"\\n\",\"$L1c7\",\"\\n\",\"$L1c8\",\"\\n\",\"$L1c9\",\"\\n\",\"$L1ca\",\"\\n\",\"$L1cb\",\"\\n\",\"$L1cc\",\"\\n\",\"$L1cd\",\"\\n\",\"$L1ce\",\"\\n\",\"$L1cf\",\"\\n\",\"$L1d0\",\"\\n\",\"$L1d1\",\"\\n\",\"$L1d2\",\"\\n\",\"$L1d3\",\"\\n\",\"$L1d4\",\"\\n\",\"$L1d5\",\"\\n\",\"$L1d6\",\"\\n\",\"$L1d7\",\"\\n\",\"$L1d8\",\"\\n\",\"$L1d9\",\"\\n\",\"$L1da\",\"\\n\",\"$L1db\",\"\\n\",\"$L1dc\",\"\\n\",\"$L1dd\",\"\\n\",\"$L1de\",\"\\n\",\"$L1df\",\"\\n\",\"$L1e0\",\"\\n\",\"$L1e1\",\"\\n\",\"$L1e2\",\"\\n\",\"$L1e3\",\"\\n\",\"$L1e4\",\"\\n\",\"$L1e5\",\"\\n\",\"$L1e6\",\"\\n\",\"$L1e7\",\"\\n\",\"$L1e8\",\"\\n\",\"$L1e9\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L1ea\",\"\\n\",\"$L1eb\",\"\\n\",\"$L1ec\",\"\\n\",\"$L1ed\",\"\\n\",\"$L1ee\",\"\\n\",\"$L1ef\",\"\\n\",\"$L1f0\",\"\\n\",\"$L1f1\",\"\\n\",\"$L1f2\",\"\\n\",\"$L1f3\",\"\\n\",\"$L1f4\",\"\\n\",\"$L1f5\",\"\\n\",\"$L1f6\",\"\\n\",\"$L1f7\",\"\\n\",\"$L1f8\",\"\\n\",\"$L1f9\",\"\\n\",\"$L1fa\",\"\\n\",\"$L1fb\",\"\\n\",\"$L1fc\",\"\\n\",\"$L1fd\",\"\\n\",\"$L1fe\",\"\\n\",\"$L1ff\",\"\\n\",\"$L200\",\"\\n\",\"$L201\",\"\\n\",\"$L202\",\"\\n\",\"$L203\",\"\\n\",\"$L204\",\"\\n\",\"$L205\",\"\\n\",\"$L206\",\"\\n\",\"$L207\",\"\\n\",\"$L208\",\"\\n\",\"$L209\",\"\\n\",\"$L20a\",\"\\n\",\"$L20b\",\"\\n\",\"$L20c\",\"\\n\",\"$L20d\",\"\\n\",\"$L20e\",\"\\n\",\"$L20f\",\"\\n\",\"$L210\",\"\\n\",\"$L211\",\"\\n\",\"$L212\",\"\\n\",\"$L213\",\"\\n\",\"$L214\",\"\\n\",\"$L215\",\"\\n\",\"$L216\",\"\\n\",\"$L217\",\"\\n\",\"$L218\",\"\\n\",\"$L219\",\"\\n\",\"$L21a\",\"\\n\",\"$L21b\",\"\\n\",\"$L21c\",\"\\n\",\"$L21d\",\"\\n\",\"$L21e\",\"\\n\",\"$L21f\",\"\\n\",\"$L220\",\"\\n\",\"$L221\",\"\\n\",\"$L222\",\"\\n\",\"$L223\",\"\\n\",\"$L224\",\"\\n\",\"$L225\",\"\\n\",\"$L226\",\"\\n\",\"$L227\",\"\\n\",\"$L228\",\"\\n\",\"$L229\",\"\\n\",\"$L22a\",\"\\n\",\"$L22b\",\"\\n\",\"$L22c\",\"\\n\",\"$L22d\",\"\\n\",\"$L22e\",\"\\n\",\"$L22f\",\"\\n\",\"$L230\",\"\\n\",\"$L231\",\"\\n\",\"$L232\",\"\\n\",\"$L233\",\"\\n\",\"$L234\",\"\\n\",\"$L235\",\"\\n\",\"$L236\",\"\\n\",\"$L237\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L238\",\"\\n\",\"$L239\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L23a\",\"\\n\",\"$L23b\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L23c\",\"\\n\",\"$L23d\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L23e\",\"\\n\",\"$L23f\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L240\",\"\\n\",\"$L241\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L242\",\"\\n\",\"$L243\",\"\\n\",\"$L244\",\"\\n\",\"$L245\",\"\\n\",\"$L246\",\"\\n\",\"$L247\",\"\\n\",\"$L248\",\"\\n\",\"$L249\",\"\\n\",\"$L24a\",\"\\n\",\"$L24b\",\"\\n\",\"$L24c\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L24d\",\"\\n\",\"$L24e\",\"\\n\",\"$L24f\",\"\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\",\"$L250\",\"\\n\",\"$L251\",\"\\n\",\"$L252\",\"\\n\",\"$L253\",\"\\n\",\"$L254\",\"\\n\",\"$L255\",\"\\n\",\"$L256\",\"\\n\",\"$L257\",\"\\n\",\"$L258\",\"\\n\",\"$L259\",\"\\n\",\"$L25a\",\"\\n\",\"$L25b\",\"\\n\",\"$L25c\",\"\\n\",\"$L25d\",\"\\n\",\"$L25e\",\"\\n\",\"$L25f\",\"\\n\",\"$L260\",\"\\n\",\"$L261\",\"\\n\",\"$L262\",\"\\n\",\"$L263\",\"\\n\",\"$L264\",\"\\n\",\"$L265\",\"\\n\",\"$L266\",\"\\n\",\"$L267\",\"\\n\",\"$L268\",\"\\n\",\"$L269\",\"\\n\",\"$L26a\",\"\\n\",\"$L26b\",\"\\n\",\"$L26c\",\"\\n\",\"$L26d\",\"\\n\",\"$L26e\",\"\\n\",\"$L26f\",\"\\n\",\"$L270\",\"\\n\",\"$L271\",\"\\n\",\"$L272\",\"\\n\",\"$L273\",\"\\n\",\"$L274\",\"\\n\",\"$L275\",\"\\n\",\"$L276\",\"\\n\",\"$L277\",\"\\n\",\"$L278\",\"\\n\",\"$L279\",\"\\n\",\"$L27a\",\"\\n\",\"$L27b\",\"\\n\",\"$L27c\",\"\\n\",\"$L27d\",\"\\n\",\"$L27e\",\"\\n\",\"$L27f\",\"\\n\",\"$L280\",\"\\n\",\"$L281\",\"\\n\",\"$L282\",\"\\n\",\"$L283\",\"\\n\",\"$L284\",\"\\n\",\"$L285\",\"\\n\",\"$L286\",\"\\n\",\"$L287\",\"\\n\",\"$L288\",\"\\n\",\"$L289\",\"\\n\",\"$L28a\",\"\\n\",\"$L28b\",\"\\n\",\"$L28c\",\"\\n\",\"$L28d\",\"\\n\",\"$L28e\",\"\\n\",\"$L28f\",\"\\n\",\"$L290\",\"\\n\",\"$L291\",\"\\n\",\"$L292\",\"\\n\",\"$L293\",\"\\n\",\"$L294\",\"\\n\",\"$L295\",\"\\n\",\"$L296\",\"\\n\",\"$L297\",\"\\n\",\"$L298\",\"\\n\",\"$L299\",\"\\n\",\"$L29a\",\"\\n\",\"$L29b\",\"\\n\",\"$L29c\",\"\\n\",\"$L29d\",\"\\n\",\"$L29e\",\"\\n\",\"$L29f\",\"\\n\",\"$L2a0\",\"\\n\",\"$L2a1\",\"\\n\",\"$L2a2\",\"\\n\",\"$L2a3\",\"\\n\",\"$L2a4\",\"\\n\",\"$L2a5\",\"\\n\",\"$L2a6\",\"\\n\",\"$L2a7\",\"\\n\",\"$L2a8\",\"\\n\",\"$L2a9\",\"\\n\",\"$L2aa\",\"\\n\",\"$L2ab\",\"\\n\",\"$L2ac\",\"\\n\",\"$L2ad\",\"\\n\",\"$L2ae\",\"\\n\",\"$L2af\",\"\\n\",\"$L2b0\",\"\\n\",\"$L2b1\",\"\\n\",\"$L2b2\",\"\\n\",\"$L2b3\",\"\\n\",\"$L2b4\",\"\\n\",\"$L2b5\",\"\\n\",\"$L2b6\",\"\\n\",\"$L2b7\",\"\\n\",\"$L2b8\",\"\\n\",\"$L2b9\",\"\\n\",\"$L2ba\",\"\\n\",\"$L2bb\",\"\\n\",\"$L2bc\",\"\\n\",\"$L2bd\",\"\\n\",\"$L2be\",\"\\n\",\"$L2bf\",\"\\n\",\"$L2c0\",\"\\n\",\"$L2c1\",\"\\n\",\"$L2c2\",\"\\n\",\"$L2c3\",\"\\n\",\"$L2c4\",\"\\n\",\"$L2c5\",\"\\n\",\"$L2c6\",\"\\n\",\"$L2c7\",\"\\n\",\"$L2c8\",\"\\n\",\"$L2c9\",\"\\n\",\"$L2ca\",\"\\n\",\"$L2cb\",\"\\n\",\"$L2cc\",\"\\n\",\"$L2cd\",\"\\n\",\"$L2ce\"]}],\"$L2cf\"]}],\"$L2d0\"]}],\"$L2d1\"]}]\n"])</script><script>self.__next_f.push([1,"2d2:I[52947,[\"/_next/static/chunks/a2b8d4a1cdff0364.js\",\"/_next/static/chunks/e3b299ee8c272c5f.js\",\"/_next/static/chunks/5b18bef1a05fed16.js\",\"/_next/static/chunks/8d9ce281e2699520.js\",\"/_next/static/chunks/f2bcd5b1588de598.js\",\"/_next/static/chunks/8178183a3ca3d9dc.js\",\"/_next/static/chunks/345e895c375d62af.js\"],\"CodeBlock\"]\n337:I[62867,[\"/_next/static/chunks/a2b8d4a1cdff0364.js\",\"/_next/static/chunks/e3b299ee8c272c5f.js\",\"/_next/static/chunks/5b18bef1a05fed16.js\",\"/_next/static/chunks/8d9ce281e2699520.js\",\"/_next/static/chunks/f2bcd5b1588de598.js\",\"/_next/static/chunks/8178183a3ca3d9dc.js\",\"/_next/static/chunks/345e895c375d62af.js\"],\"RelatedPosts\"]\n529:I[47331,[\"/_next/static/chunks/a2b8d4a1cdff0364.js\",\"/_next/static/chunks/e3b299ee8c272c5f.js\",\"/_next/static/chunks/5b18bef1a05fed16.js\",\"/_next/static/chunks/8d9ce281e2699520.js\",\"/_next/static/chunks/f2bcd5b1588de598.js\",\"/_next/static/chunks/8178183a3ca3d9dc.js\",\"/_next/static/chunks/345e895c375d62af.js\"],\"ScrollToTop\"]\n12:[\"$\",\"li\",\"li-2\",{\"children\":\"Latency tolerance for particular interaction modalities\"}]\n13:[\"$\",\"li\",\"li-3\",{\"children\":\"Economic considerations regarding API utilization\"}]\n14:[\"$\",\"h2\",\"h2-2\",{\"id\":\"functional-capabilities-and-implementation-vectors\",\"children\":\"Functional Capabilities and Implementation Vectors\"}]\n15:[\"$\",\"p\",\"p-4\",{\"children\":\"This architectural synthesis manifests several advanced capabilities:\"}]\n16:[\"$\",\"ol\",\"ol-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Cognitive Load Distribution\"}],\": The system intelligently routes cognitive tasks between local and remote execution environments based on complexity, resource requirements, and privacy constraints.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Tool Integration Framework\"}],\": Both OpenAI's agents and Ollama instances can leverage a unified tool ecosystem, allowing for consistent interaction patterns with external systems.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Conversational State Management\"}],\": A sophisticated state management system maintains coherent interaction context across the distributed computational environment.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Fallback Mechanisms\"}],\": The architecture implements graceful degradation pathways, ensuring functionality persistence when either component faces constraints.\"]}],\"\\n\"]}],\"\\n\"]}]\n17:[\"$\",\"h2\",\"h2-3\",{\"id\":\"implementation-methodology\",\"children\":\"Implementation Methodology\"}]\n18:[\"$\",\"p\",\"p-5\",{\"children\":[\"The GitHub repository (\",[\"$\",\"a\",\"a-0\",{\"href\":\"https://github.com/kliewerdaniel/OpenAIAgentsSDKOllama01\",\"children\":\"kliewerdaniel/OpenAIAgentsSDKOllama01\"}],\") provides the foundational code structure for this integration. The implementation follows a modular approach that encapsulates:\"]}]\n19:[\"$\",\"ul\",\"ul-1\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Abstraction layers for model interactions\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Contextual routing logic\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Unified response formatting\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Configurable threshold parameters for decision boundaries\"}],\"\\n\"]}]\n1a:[\"$\",\"h2\",\"h2-4\",{\"id\":\"theoretical-implications-and-future-directions\",\"children\":\"Theoretical Implications and Future Directions\"}]\n1b:[\"$\",\"p\",\"p-6\",{\"children\":\"This architectural approach represents a significant advancement in distributed AI systems theory. By creating a harmonious integration of cloud and edge AI capabilities, it establishes a framework for future systems that may further blur the boundaries between computational environments.\"}]\n1c:[\"$\",\"p\",\"p-7\",{\"children\":\"The integration opens avenues for research in several domains:\"}]\n1d:[\"$\",\"ul\",\"ul-2\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Optimal decision boundaries for computational routing\"}],\"\\n\","])</script><script>self.__next_f.push([1,"[\"$\",\"li\",\"li-1\",{\"children\":\"Privacy-preserving techniques for sensitive information processing\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Economic models for hybrid AI systems\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Cognitive load balancing algorithms\"}],\"\\n\"]}]\n1e:[\"$\",\"h2\",\"h2-5\",{\"id\":\"conclusion\",\"children\":\"Conclusion\"}]\n1f:[\"$\",\"p\",\"p-8\",{\"children\":\"The integration of OpenAI's Agents SDK with Ollama represents not merely a technical implementation but a philosophical statement about the future of AI architectures. It suggests a path toward systems that transcend binary distinctions between local and remote, private and shared, efficient and powerful—instead creating a nuanced computational environment that adapts to the specific needs of each interaction context.\"}]\n20:[\"$\",\"p\",\"p-9\",{\"children\":\"This approach invites further exploration and refinement, as the field continues to evolve toward increasingly sophisticated hybrid AI architectures that balance capability, privacy, efficiency, and cost.\"}]\n21:[\"$\",\"h1\",\"h1-1\",{\"id\":\"technical-infrastructure-establishing-the-development-environment-for-openai-ollama-integration\",\"children\":\"Technical Infrastructure: Establishing the Development Environment for OpenAI-Ollama Integration\"}]\n22:[\"$\",\"h2\",\"h2-6\",{\"id\":\"foundational-dependencies-and-technological-requisites\",\"children\":\"Foundational Dependencies and Technological Requisites\"}]\n23:[\"$\",\"p\",\"p-10\",{\"children\":\"The implementation of a sophisticated hybrid AI architecture integrating OpenAI's Agents SDK with Ollama necessitates a carefully curated technological stack. This infrastructure must accommodate both cloud-based intelligence and local inference capabilities within a coherent framework.\"}]\n24:[\"$\",\"h2\",\"h2-7\",{\"id\":\"core-dependencies\",\"children\":\"Core Dependencies\"}]\n25:[\"$\",\"h3\",\"h3-0\",{\"id\":\"python-environment\",\"children\":\"Python Environment\"}]\n26:[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"Python 3.10+ (3.11 recommended for optimal performance characteristics)\\n\"}],\"position\":{\"start\":{\"line\":71,\"column\":1,\"offset\":4505},\"end\":{\"line\":73,\"column\":4,\"offset\":4584}}},\"children\":\"Python 3.10+ (3.11 recommended for optimal performance characteristics)\\n\"}]}]\n27:[\"$\",\"h3\",\"h3-1\",{\"id\":\"essential-python-packages\",\"children\":\"Essential Python Packages\"}]\n"])</script><script>self.__next_f.push([1,"28:[\"$\",\"pre\",\"pre-1\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"openai\u003e=1.12.0          # Provides Agents SDK capabilities\\nollama\u003e=0.1.6           # Python client for Ollama interaction\\nfastapi\u003e=0.109.0        # API framework for service endpoints\\nuvicorn\u003e=0.27.0         # ASGI server implementation\\npydantic\u003e=2.5.0         # Data validation and settings management\\npython-dotenv\u003e=1.0.0    # Environment variable management\\nrequests\u003e=2.31.0        # HTTP requests for external service interaction\\nwebsockets\u003e=12.0        # WebSocket support for real-time communication\\ntenacity\u003e=8.2.3         # Retry logic for resilient API interactions\\n\"}],\"position\":{\"start\":{\"line\":76,\"column\":1,\"offset\":4616},\"end\":{\"line\":86,\"column\":4,\"offset\":5198}}},\"children\":\"openai\u003e=1.12.0          # Provides Agents SDK capabilities\\nollama\u003e=0.1.6           # Python client for Ollama interaction\\nfastapi\u003e=0.109.0        # API framework for service endpoints\\nuvicorn\u003e=0.27.0         # ASGI server implementation\\npydantic\u003e=2.5.0         # Data validation and settings management\\npython-dotenv\u003e=1.0.0    # Environment variable management\\nrequests\u003e=2.31.0        # HTTP requests for external service interaction\\nwebsockets\u003e=12.0        # WebSocket support for real-time communication\\ntenacity\u003e=8.2.3         # Retry logic for resilient API interactions\\n\"}]}]\n"])</script><script>self.__next_f.push([1,"29:[\"$\",\"h3\",\"h3-2\",{\"id\":\"external-services\",\"children\":\"External Services\"}]\n2a:[\"$\",\"pre\",\"pre-2\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OpenAI API access (API key required)\\nOllama (local installation)\\n\"}],\"position\":{\"start\":{\"line\":89,\"column\":1,\"offset\":5222},\"end\":{\"line\":92,\"column\":4,\"offset\":5294}}},\"children\":\"OpenAI API access (API key required)\\nOllama (local installation)\\n\"}]}]\n2b:[\"$\",\"h2\",\"h2-8\",{\"id\":\"environment-configuration\",\"children\":\"Environment Configuration\"}]\n2c:[\"$\",\"h3\",\"h3-3\",{\"id\":\"installation-procedure\",\"children\":\"Installation Procedure\"}]\n"])</script><script>self.__next_f.push([1,"2d:[\"$\",\"ol\",\"ol-1\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Python Environment Initialization\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Create isolated environment\\npython -m venv venv\\n\\n# Activate environment\\n# On Unix/macOS:\\nsource venv/bin/activate\\n# On Windows:\\nvenv\\\\Scripts\\\\activate\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Dependency Installation\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"pip install openai ollama fastapi uvicorn pydantic python-dotenv requests websockets tenacity\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Ollama Installation\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# macOS (using Homebrew)\\nbrew install ollama\\n\\n# Linux (using curl)\\ncurl -fsSL https://ollama.com/install.sh | sh\\n\\n# Windows\\n# Download from https://ollama.com/download/windows\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Model Initialization for Ollama\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Pull high-performance local model (e.g., Llama2)\\nollama pull llama2\\n\\n# Optional: Pull additional specialized models\\nollama pull mistral\\nollama pull codellama\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"2e:[\"$\",\"h3\",\"h3-4\",{\"id\":\"environment-configuration-1\",\"children\":\"Environment Configuration\"}]\n2f:[\"$\",\"p\",\"p-11\",{\"children\":[\"Create a \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\".env\",\"position\":{\"start\":{\"line\":139,\"column\":10,\"offset\":6259},\"end\":{\"line\":139,\"column\":16,\"offset\":6265}}}],\"position\":{\"start\":{\"line\":139,\"column\":10,\"offset\":6259},\"end\":{\"line\":139,\"column\":16,\"offset\":6265}}},\"children\":\".env\"}],\" file in the project root with the following parameters:\"]}]\n30:[\"$\",\"pre\",\"pre-3\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"# OpenAI Configuration\\nOPENAI_API_KEY=sk-...\\nOPENAI_ORG_ID=org-...  # Optional\\n\\n# Model Configuration\\nOPENAI_MODEL=gpt-4o\\nOLLAMA_MODEL=llama2\\nOLLAMA_HOST=http://localhost:11434\\n\\n# System Behavior\\nTEMPERATURE=0.7\\nMAX_TOKENS=4096\\nREQUEST_TIMEOUT=120\\n\\n# Routing Configuration\\nCOMPLEXITY_THRESHOLD=0.65\\nPRIVACY_SENSITIVE_TOKENS=[\\\"password\\\", \\\"secret\\\", \\\"token\\\", \\\"key\\\", \\\"credential\\\"]\\n\\n# Logging Configuration\\nLOG_LEVEL=INFO\\n\"}],\"position\":{\"start\":{\"line\":141,\"column\":1,\"offset\":6323},\"end\":{\"line\":162,\"column\":4,\"offset\":6747}}},\"children\":\"# OpenAI Configuration\\nOPENAI_API_KEY=sk-...\\nOPENAI_ORG_ID=org-...  # Optional\\n\\n# Model Configuration\\nOPENAI_MODEL=gpt-4o\\nOLLAMA_MODEL=llama2\\nOLLAMA_HOST=http://localhost:11434\\n\\n# System Behavior\\nTEMPERATURE=0.7\\nMAX_TOKENS=4096\\nREQUEST_TIMEOUT=120\\n\\n# Routing Configuration\\nCOMPLEXITY_THRESHOLD=0.65\\nPRIVACY_SENSITIVE_TOKENS=[\\\"password\\\", \\\"secret\\\", \\\"token\\\", \\\"key\\\", \\\"credential\\\"]\\n\\n# Logging Configuration\\nLOG_LEVEL=INFO\\n\"}]}]\n31:[\"$\",\"h2\",\"h2-9\",{\"id\":\"development-environment-setup\",\"children\":\"Development Environment Setup\"}]\n32:[\"$\",\"h3\",\"h3-5\",{\"id\":\"repository-initialization\",\"children\":\"Repository Initialization\"}]\n33:[\"$\",\"pre\",\"pre-4\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"git clone https://github.com/kliewerdaniel/OpenAIAgentsSDKOllama01.git\\ncd OpenAIAgentsSDKOllama01\\n\"}]}]\n34:[\"$\",\"h3\",\"h3-6\",{\"id\":\"project-structure-implementation\",\"children\":\"Project Structure Implementation\"}]\n35:[\"$\",\"pre\",\"pre-5\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"mkdir -p app/core app/models app/routers app/services app/utils tests\\ntouch app/__init__.py app/core/__init__.py app/models/__init__.py app/routers/__init__.py app/services/__init__.py app/utils/__init__.py\\n\"}]}]\n36:[\"$\",\"h3\",\"h3-7\",{\"id\":\"local-development-server\",\"children\":\"Local Development Server\"}]\n37:[\"$\",\"pre\",\"pre-6\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Start Ollama service\\nollama serve\\n\\n# In a separate terminal, start the application\\nuvicorn app.main:app --reload\\n\"}]}]\n38:[\"$\",\"h2\",\"h2-10\",{\"id\":\"containerization-optional\",\"children\":\"Containerization (Optional)\"}]\n39:[\"$\",\"p\",\"p-12\",{\"children\":\"For reproducible environments and deployment consistency:\"}]\n3a:[\"$\",\"pre\",\"pre-7\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-dockerfile\",\"children\":\"# Dockerfile\\nFROM python:3.11-slim\\n\\nWORKDIR /app\\n\\nCOPY requirements.txt .\\nRUN pip install --no-cache-dir -r requirements.txt\\n\\nCOPY . .\\n\\nCMD [\\\"uvicorn\\\", \\\"app.main:app\\\", \\\"--host\\\", \\\"0.0.0.0\\\", \\\"--port\\\", \\\"8000\\\"]\\n\"}]}]\n3b:[\"$\",\"p\",\"p-13\",{\"children\":\"With Docker Compose integration for Ollama:\"}]\n3c:[\"$\",\"pre\",\"pre-8\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\""])</script><script>self.__next_f.push([1,"# docker-compose.yml\\nversion: '3.8'\\n\\nservices:\\n  app:\\n    build: .\\n    ports:\\n      - \\\"8000:8000\\\"\\n    environment:\\n      - OLLAMA_HOST=http://ollama:11434\\n    depends_on:\\n      - ollama\\n    volumes:\\n      - .:/app\\n      \\n  ollama:\\n    image: ollama/ollama:latest\\n    ports:\\n      - \\\"11434:11434\\\"\\n    volumes:\\n      - ollama_data:/root/.ollama\\n\\nvolumes:\\n  ollama_data:\\n\"}]}]\n3d:[\"$\",\"h2\",\"h2-11\",{\"id\":\"verification-of-installation\",\"children\":\"Verification of Installation\"}]\n3e:[\"$\",\"p\",\"p-14\",{\"children\":\"To validate the environment configuration:\"}]\n3f:[\"$\",\"pre\",\"pre-9\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -c \\\"import openai; import ollama; print('OpenAI SDK Version:', openai.__version__); print('Ollama Client Version:', ollama.__version__)\\\"\\n\"}]}]\n40:[\"$\",\"p\",\"p-15\",{\"children\":\"To test Ollama connectivity:\"}]\n41:[\"$\",\"pre\",\"pre-10\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -c \\\"import ollama; print(ollama.list())\\\"\\n\"}]}]\n42:[\"$\",\"p\",\"p-16\",{\"children\":\"To test OpenAI API connectivity:\"}]\n43:[\"$\",\"pre\",\"pre-11\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -c \\\"import openai; import os; from dotenv import load_dotenv; load_dotenv(); client = openai.OpenAI(); print(client.models.list())\\\"\\n\"}]}]\n44:[\"$\",\"p\",\"p-17\",{\"children\":\"This comprehensive environment setup establishes the foundation for a sophisticated hybrid AI system that leverages both cloud-based intelligence and local inference capabilities. The configuration allows for flexible routing of requests based on privacy considerations, computational complexity, and performance requirements.\"}]\n45:[\"$\",\"h1\",\"h1-2\",{\"id\":\"integration-architecture-openai-responses-api-within-the-mcp-framework\",\"children\":\"Integration Architecture: OpenAI Responses API within the MCP Framework\"}]\n46:[\"$\",\"h2\",\"h2-12\",{\"id\":\"theoretical-framework-for-api-integration\",\"children\":\"Theoretical Framework for API Integration\"}]\n47:[\"$\",\"p\",\"p-18\",{\"children\":\"The integration of OpenAI's Responses API within our Modern Computational Paradigm (MCP) framework represents a sophisticated exercise in distributed intelligence architecture. This document delineates the structural components, interface definitions, and operational parameters for establishing a cohesive integration that leverages both cloud-based and local inference capabilities.\"}]\n48:[\"$\",\"h2\",\"h2-13\",{\"id\":\"api-architectural-design\",\"children\":\"API Architectural Design\"}]\n49:[\"$\",\"h3\",\"h3-8\",{\"id\":\"core-endpoints-structure\",\"children\":\"Core Endpoints Structure\"}]\n4a:[\"$\",\"p\",\"p-19\",{\"children\":\"The system exposes a carefully designed set of endpoints that abstract the underlying complexity of model routing and response generation:\"}]\n"])</script><script>self.__next_f.push([1,"4b:[\"$\",\"pre\",\"pre-12\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"/api/v1\\n├── /chat\\n│   ├── POST /completions       # Primary conversational interface\\n│   ├── POST /streaming         # Event-stream response generation\\n│   └── POST /hybrid            # Intelligent routing between OpenAI and Ollama\\n├── /tools\\n│   ├── POST /execute           # Tool execution framework\\n│   └── GET /available          # Tool discovery mechanism\\n├── /agents\\n│   ├── POST /run               # Agent execution with Agents SDK\\n│   ├── GET /status/{run_id}    # Asynchronous execution status\\n│   └── POST /cancel/{run_id}   # Execution termination\\n└── /system\\n    ├── GET /health             # Service health verification\\n    ├── GET /models             # Available model enumeration\\n    └── POST /config            # Runtime configuration adjustment\\n\"}],\"position\":{\"start\":{\"line\":269,\"column\":1,\"offset\":9627},\"end\":{\"line\":286,\"column\":4,\"offset\":10396}}},\"children\":\"/api/v1\\n├── /chat\\n│   ├── POST /completions       # Primary conversational interface\\n│   ├── POST /streaming         # Event-stream response generation\\n│   └── POST /hybrid            # Intelligent routing between OpenAI and Ollama\\n├── /tools\\n│   ├── POST /execute           # Tool execution framework\\n│   └── GET /available          # Tool discovery mechanism\\n├── /agents\\n│   ├── POST /run               # Agent execution with Agents SDK\\n│   ├── GET /status/{run_id}    # Asynchronous execution status\\n│   └── POST /cancel/{run_id}   # Execution termination\\n└── /system\\n    ├── GET /health             # Service health verification\\n    ├── GET /models             # Available model enumeration\\n    └── POST /config            # Runtime configuration adjustment\\n\"}]}]\n"])</script><script>self.__next_f.push([1,"4c:[\"$\",\"h3\",\"h3-9\",{\"id\":\"requestresponse-schemata\",\"children\":\"Request/Response Schemata\"}]\n4d:[\"$\",\"h4\",\"h4-0\",{\"id\":\"primary-chat-interface\",\"children\":\"Primary Chat Interface\"}]\n2d3:T48d,// POST /api/v1/chat/completions\n// Request\n{\n  \"messages\": [\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": \"Explain quantum computing.\"}\n  ],\n  \"model\": \"auto\",  // \"auto\", \"openai:\u003cmodel_id\u003e\", or \"ollama:\u003cmodel_id\u003e\"\n  \"temperature\": 0.7,\n  \"max_tokens\": 1024,\n  \"stream\": false,\n  \"routing_preferences\": {\n    \"force_provider\": null,  // null, \"openai\", \"ollama\"\n    \"privacy_level\": \"standard\",  // \"standard\", \"high\", \"max\"\n    \"latency_preference\": \"balanced\"  // \"speed\", \"balanced\", \"quality\"\n  },\n  \"tools\": [...]  // Optional tool definitions\n}\n\n// Response\n{\n  \"id\": \"resp_abc123\",\n  \"object\": \"chat.completion\",\n  \"created\": 1677858242,\n  \"provider\": \"openai\",  // The actual provider used\n  \"model\": \"gpt-4o\",\n  \"usage\": {\n    \"prompt_tokens\": 56,\n    \"completion_tokens\": 325,\n    \"total_tokens\": 381\n  },\n  \"message\": {\n    \"role\": \"assistant\",\n    \"content\": \"Quantum computing is...\",\n    \"tool_calls\": []  // Optional tool calls if requested\n  },\n  \"routing_metrics\": {\n    \"complexity_score\": 0.78,\n    \"privacy_impact\": \"low\",\n    \"decision_factors\": [\"complexity\", \"tool_requirements\"]\n  }\n}\n4e:[\"$\",\"pre\",\"pre-13\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"$2d3\"}]}]\n4f:[\"$\",\"h4\",\"h4-1\",{\"id\":\"agent-execution-interface\",\"children\":\"Agent Execution Interface\"}]\n50:[\"$\",\"pre\",\"pre-14\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"// POST /api/v1/agents/run\\n// Request\\n{\\n  \\\"agent_config\\\": {\\n    \\\"instructions\\\": \\\"You are a research assistant. Help the user find information about recent AI developments.\\\",\\n    \\\"model\\\": \\\"gpt-4o\\\",\\n    \\\"tools\\\": [\\n      // Tool definitions following OpenAI's format\\n    ]\\n  },\\n  \\\"messages\\\": [\\n    {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Find recent papers on transformer efficiency.\\\"}\\n  ],\\n  \\\"metadata\\\": {\\n    \\\"session_id\\\": \\\"user_session_abc123\\\",\\n    \\\"locale\\\": \\\"en-US\\\"\\n  }\\n}\\n\\n// Response\\n{\\n  \\\"run_id\\\": \\\"run_def456\\\",\\n  \\\"status\\\": \\\"in_progress\\\",\\n  \\\"created_at\\\": 1677858242,\\n  \\\"estimated_completion_time\\\": 1677858260,\\n  \\\"polling_url\\\": \\\"/api/v1/agents/status/run_def456\\\"\\n}\\n\"}]}]\n51:[\"$\",\"h2\",\"h2-14\",{\"id\":\"authentication--security-framework\",\"children\":\"Authentication \u0026 Security Framework\"}]\n52:[\"$\",\"h3\",\"h3-10\",{\"id\":\"authentication-mechanisms\",\"children\":\"Authentication Mechanisms\"}]\n53:[\"$\",\"p\",\"p-20\",{\"children\":\"The system implements a layered authentication approach:\"}]\n"])</script><script>self.__next_f.push([1,"54:[\"$\",\"ol\",\"ol-2\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"API Key Authentication\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"Authorization: Bearer {api_key}\\n\"}],\"position\":{\"start\":{\"line\":376,\"column\":4,\"offset\":12498},\"end\":{\"line\":378,\"column\":7,\"offset\":12543}}},\"children\":\"Authorization: Bearer {api_key}\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"OpenAI Credential Management\"}]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Server-side credential storage with encryption at rest\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Optional client-provided credentials per request\"}],\"\\n\"]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"// Optional credential override\\n{\\n  \\\"auth_override\\\": {\\n    \\\"openai_api_key\\\": \\\"sk_...\\\",\\n    \\\"openai_org_id\\\": \\\"org-...\\\"\\n  }\\n}\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Session-Based Authentication\"}],\" (Web Interface)\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"JWT-based authentication with refresh token rotation\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"PKCE flow for authorization code exchanges\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"55:[\"$\",\"h3\",\"h3-11\",{\"id\":\"security-considerations\",\"children\":\"Security Considerations\"}]\n56:[\"$\",\"ul\",\"ul-3\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"TLS 1.3 required for all communications\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Request signing for high-security deployments\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Content-Security-Policy headers to prevent XSS\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Rate limiting by user/IP with exponential backoff\"}],\"\\n\"]}]\n57:[\"$\",\"h2\",\"h2-15\",{\"id\":\"error-handling-architecture\",\"children\":\"Error Handling Architecture\"}]\n58:[\"$\",\"p\",\"p-21\",{\"children\":\"The system implements a comprehensive error handling framework:\"}]\n59:[\"$\",\"pre\",\"pre-15\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"// Error Response Structure\\n{\\n  \\\"error\\\": {\\n    \\\"code\\\": \\\"provider_error\\\",\\n    \\\"message\\\": \\\"OpenAI API returned an error\\\",\\n    \\\"details\\\": {\\n      \\\"provider\\\": \\\"openai\\\",\\n      \\\"status_code\\\": 429,\\n      \\\"original_message\\\": \\\"Rate limit exceeded\\\",\\n      \\\"request_id\\\": \\\"req_ghi789\\\"\\n    },\\n    \\\"remediation\\\": {\\n      \\\"retry_after\\\": 30,\\n      \\\"alternatives\\\": [\\\"switch_provider\\\", \\\"reduce_complexity\\\"],\\n      \\\"fallback_available\\\": true\\n    }\\n  }\\n}\\n\"}]}]\n5a:[\"$\",\"h3\",\"h3-12\",{\"id\":\"error-categories\",\"children\":\"Error Categories\"}]\n"])</script><script>self.__next_f.push([1,"5b:[\"$\",\"ol\",\"ol-3\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Provider Errors\"}],\" (\",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"provider_error\",\"position\":{\"start\":{\"line\":431,\"column\":25,\"offset\":13830},\"end\":{\"line\":431,\"column\":41,\"offset\":13846}}}],\"position\":{\"start\":{\"line\":431,\"column\":25,\"offset\":13830},\"end\":{\"line\":431,\"column\":41,\"offset\":13846}}},\"children\":\"provider_error\"}],\")\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"OpenAI API failures\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Ollama execution failures\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Network connectivity issues\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Input Validation Errors\"}],\" (\",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"validation_error\",\"position\":{\"start\":{\"line\":436,\"column\":33,\"offset\":13970},\"end\":{\"line\":436,\"column\":51,\"offset\":13988}}}],\"position\":{\"start\":{\"line\":436,\"column\":33,\"offset\":13970},\"end\":{\"line\":436,\"column\":51,\"offset\":13988}}},\"children\":\"validation_error\"}],\")\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Schema validation failures\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Content policy violations\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Size limit exceedances\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"System Errors\"}],\" (\",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"system_error\",\"position\":{\"start\":{\"line\":441,\"column\":23,\"offset\":14104},\"end\":{\"line\":441,\"column\":37,\"offset\":14118}}}],\"position\":{\"start\":{\"line\":441,\"column\":23,\"offset\":14104},\"end\":{\"line\":441,\"column\":37,\"offset\":14118}}},\"children\":\"system_error\"}],\")\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Resource exhaustion\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Internal component failures\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Dependency service outages\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Authentication Errors\"}],\" (\",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"auth_error\",\"position\":{\"start\":{\"line\":446,\"column\":31,\"offset\":14241},\"end\":{\"line\":446,\"column\":43,\"offset\":14253}}}],\"position\":{\"start\":{\"line\":446,\"column\":31,\"offset\":14241},\"end\":{\"line\":446,\"column\":43,\"offset\":14253}}},\"children\":\"auth_error\"}],\")\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Invalid credentials\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Expired tokens\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Insufficient permissions\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"5c:[\"$\",\"h2\",\"h2-16\",{\"id\":\"rate-limiting-architecture\",\"children\":\"Rate Limiting Architecture\"}]\n5d:[\"$\",\"p\",\"p-22\",{\"children\":\"The system implements a sophisticated rate limiting structure:\"}]\n5e:[\"$\",\"h3\",\"h3-13\",{\"id\":\"tiered-rate-limiting\",\"children\":\"Tiered Rate Limiting\"}]\n5f:[\"$\",\"pre\",\"pre-16\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"Standard tier:\\n  - 10 requests/minute\\n  - 100 requests/hour\\n  - 1000 requests/day\\n\\nPremium tier:\\n  - 60 requests/minute\\n  - 1000 requests/hour\\n  - 10000 requests/day\\n\"}],\"position\":{\"start\":{\"line\":457,\"column\":1,\"offset\":14452},\"end\":{\"line\":467,\"column\":4,\"offset\":14625}}},\"children\":\"Standard tier:\\n  - 10 requests/minute\\n  - 100 requests/hour\\n  - 1000 requests/day\\n\\nPremium tier:\\n  - 60 requests/minute\\n  - 1000 requests/hour\\n  - 10000 requests/day\\n\"}]}]\n60:[\"$\",\"h3\",\"h3-14\",{\"id\":\"dynamic-rate-adjustment\",\"children\":\"Dynamic Rate Adjustment\"}]\n61:[\"$\",\"ul\",\"ul-4\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Token bucket algorithm with dynamic refill rates\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Separate buckets for different endpoint categories\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Priority-based token distribution\"}],\"\\n\"]}]\n62:[\"$\",\"h3\",\"h3-15\",{\"id\":\"rate-limit-response\",\"children\":\"Rate Limit Response\"}]\n63:[\"$\",\"pre\",\"pre-17\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"error\\\": {\\n    \\\"code\\\": \\\"rate_limit_exceeded\\\",\\n    \\\"message\\\": \\\"You have exceeded the rate limit\\\",\\n    \\\"details\\\": {\\n      \\\"rate_limit\\\": {\\n        \\\"tier\\\": \\\"standard\\\",\\n        \\\"limit\\\": \\\"10 per minute\\\",\\n        \\\"remaining\\\": 0,\\n        \\\"reset_at\\\": \\\"2023-03-01T12:35:00Z\\\",\\n        \\\"retry_after\\\": 25\\n      },\\n      \\\"usage_statistics\\\": {\\n        \\\"current_minute\\\": 11,\\n        \\\"current_hour\\\": 43,\\n        \\\"current_day\\\": 178\\n      }\\n    },\\n    \\\"remediation\\\": {\\n      \\\"upgrade_url\\\": \\\"/account/upgrade\\\",\\n      \\\"alternatives\\\": [\\\"reduce_frequency\\\", \\\"batch_requests\\\"]\\n    }\\n  }\\n}\\n\"}]}]\n64:[\"$\",\"h2\",\"h2-17\",{\"id\":\"implementation-strategy\",\"children\":\"Implementation Strategy\"}]\n65:[\"$\",\"h3\",\"h3-16\",{\"id\":\"provider-abstraction-layer\",\"children\":\"Provider Abstraction Layer\"}]\n66:[\"$\",\"pre\",\"pre-18\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"# Pseudocode for the Provider Abstraction Layer\\nclass ModelProvider(ABC):\\n    @abstractmethod\\n    async def generate_completion(self, messages, params):\\n        pass\\n        \\n    @abstractmethod\\n    async def stream_completion(self, messages, params):\\n        pass\\n    \\n    @classmethod\\n    def get_provider(cls, provider_name, model_id):\\n        if provider_name == \\\"openai\\\":\\n            return OpenAIProvider(model_id)\\n        elif provider_name == \\\"ollama\\\":\\n            return OllamaProvider(model_id)\\n        else:\\n            return AutoRoutingProvider()\\n\"}]}]\n67:[\"$\",\"h3\",\"h3-17\",{\"id\":\"intelligent-routing-decision-engine\",\"children\":\"Intelligent Routing Decision Engine\"}]\n68:[\"$\",\"pre\",\"pre-19\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"# Pseudocode for Routing Logic\\nclass RoutingEngine:\\n    def __init__(self, config):\\n        self.config = config\\n        \\n    async def determine_route(self, request):\\n        # Analyze request complexity\\n        complexity = self._analyze_complexity(request.messages)\\n        \\n        # Check for privacy constraints\\n        privacy_impact = self._assess_privacy_impact(request.messages)\\n        \\n        # Consider tool requirements\\n        tools_compatible = self._check_tool_compatibility(\\n            request.tools, avai"])</script><script>self.__next_f.push([1,"lable_providers)\\n            \\n        # Make routing decision\\n        if request.routing_preferences.force_provider:\\n            return request.routing_preferences.force_provider\\n            \\n        if privacy_impact == \\\"high\\\" and self.config.privacy_first:\\n            return \\\"ollama\\\"\\n            \\n        if complexity \u003e self.config.complexity_threshold:\\n            return \\\"openai\\\"\\n            \\n        # Default routing logic\\n        return \\\"ollama\\\" if self.config.prefer_local else \\\"openai\\\"\\n\"}]}]\n69:[\"$\",\"h2\",\"h2-18\",{\"id\":\"authentication-implementation\",\"children\":\"Authentication Implementation\"}]\n6a:[\"$\",\"pre\",\"pre-20\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"# Middleware for API Key Authentication\\nasync def api_key_middleware(request, call_next):\\n    api_key = request.headers.get(\\\"Authorization\\\")\\n    \\n    if not api_key or not api_key.startswith(\\\"Bearer \\\"):\\n        return JSONResponse(\\n            status_code=401,\\n            content={\\\"error\\\": {\\n                \\\"code\\\": \\\"auth_error\\\",\\n                \\\"message\\\": \\\"Missing or invalid API key\\\"\\n            }}\\n        )\\n    \\n    # Extract and validate token\\n    token = api_key.replace(\\\"Bearer \\\", \\\"\\\")\\n    user = await validate_api_key(token)\\n    \\n    if not user:\\n        return JSONResponse(\\n            status_code=401,\\n            content={\\\"error\\\": {\\n                \\\"code\\\": \\\"auth_error\\\",\\n                \\\"message\\\": \\\"Invalid API key\\\"\\n            }}\\n        )\\n    \\n    # Attach user to request state\\n    request.state.user = user\\n    return await call_next(request)\\n\"}]}]\n6b:[\"$\",\"h2\",\"h2-19\",{\"id\":\"rate-limiting-implementation\",\"children\":\"Rate Limiting Implementation\"}]\n2d4:T6bd,"])</script><script>self.__next_f.push([1,"# Rate Limiter Implementation\nclass RateLimiter:\n    def __init__(self, redis_client):\n        self.redis = redis_client\n        \n    async def check_rate_limit(self, user_id, endpoint_category):\n        # Generate Redis keys for different time windows\n        minute_key = f\"rate:user:{user_id}:{endpoint_category}:minute\"\n        hour_key = f\"rate:user:{user_id}:{endpoint_category}:hour\"\n        \n        # Get user tier and corresponding limits\n        user_tier = await self._get_user_tier(user_id)\n        tier_limits = TIER_LIMITS[user_tier]\n        \n        # Check limits for each window\n        pipe = self.redis.pipeline()\n        pipe.incr(minute_key)\n        pipe.expire(minute_key, 60)\n        pipe.incr(hour_key)\n        pipe.expire(hour_key, 3600)\n        results = await pipe.execute()\n        \n        minute_count, _, hour_count, _ = results\n        \n        # Check if limits are exceeded\n        if minute_count \u003e tier_limits[\"per_minute\"]:\n            return {\n                \"allowed\": False,\n                \"window\": \"minute\",\n                \"limit\": tier_limits[\"per_minute\"],\n                \"current\": minute_count,\n                \"retry_after\": self._calculate_retry_after(minute_key)\n            }\n            \n        if hour_count \u003e tier_limits[\"per_hour\"]:\n            return {\n                \"allowed\": False,\n                \"window\": \"hour\",\n                \"limit\": tier_limits[\"per_hour\"],\n                \"current\": hour_count,\n                \"retry_after\": self._calculate_retry_after(hour_key)\n            }\n            \n        return {\"allowed\": True}\n        \n    async def _calculate_retry_after(self, key):\n        ttl = await self.redis.ttl(key)\n        return max(1, ttl)\n"])</script><script>self.__next_f.push([1,"6c:[\"$\",\"pre\",\"pre-21\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2d4\"}]}]\n6d:[\"$\",\"h2\",\"h2-20\",{\"id\":\"operational-considerations\",\"children\":\"Operational Considerations\"}]\n6e:[\"$\",\"ol\",\"ol-4\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Monitoring and Observability\"}]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Structured logging with correlation IDs\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Prometheus metrics for request routing decisions\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Tracing with OpenTelemetry\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Fallback Mechanisms\"}]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Circuit breaker pattern for provider failures\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Graceful degradation to simpler models\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Response caching for common queries\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Deployment Strategy\"}]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Containerized deployment with Kubernetes\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Blue/green deployment for zero-downtime updates\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Regional deployment for latency optimization\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\n6f:[\"$\",\"h2\",\"h2-21\",{\"id\":\"conclusion-1\",\"children\":\"Conclusion\"}]\n70:[\"$\",\"p\",\"p-23\",{\"children\":\"This integration architecture establishes a robust framework for leveraging both OpenAI's cloud capabilities and Ollama's local inference within a unified system. The design emphasizes flexibility, security, and resilience while providing sophisticated routing logic to optimize for different operational parameters including cost, privacy, and performance.\"}]\n71:[\"$\",\"p\",\"p-24\",{\"children\":\"The implementation allows for progressive enhancement as requirements evolve, with clear extension points for additional providers, tools, and routing strategies.\"}]\n72:[\"$\",\"h1\",\"h1-3\",{\"id\":\"autonomous-agent-architecture-python-implementations-for-mcp-integration\",\"children\":\"Autonomous Agent Architecture: Python Implementations for MCP Integration\"}]\n73:[\"$\",\"h2\",\"h2-22\",{\"id\":\"theoretical-framework-for-agent-design\",\"children\":\"Theoretical Framework for Agent Design\"}]\n74:[\"$\",\"p\",\"p-25\",{\"children\":\"This collection of Python implementations establishes a comprehensive agent architecture leveraging the Modern Computational Paradigm (MCP) system. The design emphasizes cognitive capabilities including knowledge retrieval, conversation flow management, and contextual awareness through a modular approach to agent construction.\"}]\n75:[\"$\",\"h2\",\"h2-23\",{\"id\":\"core-agent-infrastructure\",\"children\":\"Core Agent Infrastructure\"}]\n76:[\"$\",\"h3\",\"h3-18\",{\"id\":\"base-agent-class\",\"children\":\"Base Agent Class\"}]\n2d5:Tcb3,"])</script><script>self.__next_f.push([1,"# app/agents/base_agent.py\nfrom abc import ABC, abstractmethod\nfrom typing import Dict, List, Any, Optional\nimport uuid\nimport logging\nfrom pydantic import BaseModel, Field\n\nfrom app.services.provider_service import ProviderService\nfrom app.models.message import Message, MessageRole\nfrom app.models.tool import Tool\n\nlogger = logging.getLogger(__name__)\n\nclass AgentState(BaseModel):\n    \"\"\"Represents the internal state of an agent.\"\"\"\n    conversation_history: List[Message] = Field(default_factory=list)\n    memory: Dict[str, Any] = Field(default_factory=dict)\n    context: Dict[str, Any] = Field(default_factory=dict)\n    metadata: Dict[str, Any] = Field(default_factory=dict)\n    session_id: str = Field(default_factory=lambda: str(uuid.uuid4()))\n\nclass BaseAgent(ABC):\n    \"\"\"Abstract base class for all agents in the system.\"\"\"\n    \n    def __init__(\n        self,\n        provider_service: ProviderService,\n        system_prompt: str,\n        tools: Optional[List[Tool]] = None,\n        state: Optional[AgentState] = None\n    ):\n        self.provider_service = provider_service\n        self.system_prompt = system_prompt\n        self.tools = tools or []\n        self.state = state or AgentState()\n        \n        # Initialize conversation with system prompt\n        self._initialize_conversation()\n    \n    def _initialize_conversation(self):\n        \"\"\"Initialize the conversation history with the system prompt.\"\"\"\n        self.state.conversation_history.append(\n            Message(role=MessageRole.SYSTEM, content=self.system_prompt)\n        )\n    \n    async def process_message(self, message: str, user_id: str) -\u003e str:\n        \"\"\"Process a user message and return a response.\"\"\"\n        # Add user message to conversation history\n        user_message = Message(role=MessageRole.USER, content=message)\n        self.state.conversation_history.append(user_message)\n        \n        # Process the message and generate a response\n        response = await self._generate_response(user_id)\n        \n        # Add assistant response to conversation history\n        assistant_message = Message(role=MessageRole.ASSISTANT, content=response)\n        self.state.conversation_history.append(assistant_message)\n        \n        return response\n    \n    @abstractmethod\n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response based on the conversation history.\"\"\"\n        pass\n    \n    async def add_context(self, key: str, value: Any):\n        \"\"\"Add contextual information to the agent's state.\"\"\"\n        self.state.context[key] = value\n        \n    def get_conversation_history(self) -\u003e List[Message]:\n        \"\"\"Return the conversation history.\"\"\"\n        return self.state.conversation_history\n    \n    def clear_conversation(self, keep_system_prompt: bool = True):\n        \"\"\"Clear the conversation history.\"\"\"\n        if keep_system_prompt and self.state.conversation_history:\n            system_messages = [\n                msg for msg in self.state.conversation_history \n                if msg.role == MessageRole.SYSTEM\n            ]\n            self.state.conversation_history = system_messages\n        else:\n            self.state.conversation_history = []\n            self._initialize_conversation()\n"])</script><script>self.__next_f.push([1,"77:[\"$\",\"pre\",\"pre-22\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2d5\"}]}]\n78:[\"$\",\"h2\",\"h2-24\",{\"id\":\"specialized-agent-implementations\",\"children\":\"Specialized Agent Implementations\"}]\n79:[\"$\",\"h3\",\"h3-19\",{\"id\":\"research-agent-with-knowledge-retrieval\",\"children\":\"Research Agent with Knowledge Retrieval\"}]\n2d6:T1cea,"])</script><script>self.__next_f.push([1,"# app/agents/research_agent.py\nfrom typing import List, Dict, Any, Optional\nimport logging\n\nfrom app.agents.base_agent import BaseAgent\nfrom app.services.knowledge_service import KnowledgeService\nfrom app.models.message import Message, MessageRole\nfrom app.models.tool import Tool\n\nlogger = logging.getLogger(__name__)\n\nclass ResearchAgent(BaseAgent):\n    \"\"\"Agent specialized for research tasks with knowledge retrieval capabilities.\"\"\"\n    \n    def __init__(self, *args, knowledge_service: KnowledgeService, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.knowledge_service = knowledge_service\n        \n        # Register knowledge retrieval tools\n        self.tools.extend([\n            Tool(\n                name=\"search_knowledge_base\",\n                description=\"Search the knowledge base for relevant information\",\n                parameters={\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"query\": {\n                            \"type\": \"string\",\n                            \"description\": \"The search query\"\n                        },\n                        \"max_results\": {\n                            \"type\": \"integer\",\n                            \"description\": \"Maximum number of results to return\",\n                            \"default\": 3\n                        }\n                    },\n                    \"required\": [\"query\"]\n                }\n            ),\n            Tool(\n                name=\"retrieve_document\",\n                description=\"Retrieve a specific document by ID\",\n                parameters={\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"document_id\": {\n                            \"type\": \"string\",\n                            \"description\": \"The ID of the document to retrieve\"\n                        }\n                    },\n                    \"required\": [\"document_id\"]\n                }\n            )\n        ])\n    \n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response with knowledge augmentation.\"\"\"\n        # Extract the last user message\n        last_user_message = next(\n            (msg for msg in reversed(self.state.conversation_history) \n             if msg.role == MessageRole.USER), \n            None\n        )\n        \n        if not last_user_message:\n            return \"I don't have any messages to respond to.\"\n        \n        # Perform knowledge retrieval to augment the response\n        relevant_information = await self._retrieve_relevant_knowledge(last_user_message.content)\n        \n        # Add retrieved information to context\n        if relevant_information:\n            context_message = Message(\n                role=MessageRole.SYSTEM,\n                content=f\"Relevant information: {relevant_information}\"\n            )\n            augmented_history = self.state.conversation_history.copy()\n            augmented_history.insert(-1, context_message)\n        else:\n            augmented_history = self.state.conversation_history\n        \n        # Generate response using the provider service\n        response = await self.provider_service.generate_completion(\n            messages=[msg.model_dump() for msg in augmented_history],\n            tools=self.tools,\n            user=user_id\n        )\n        \n        # Process tool calls if any\n        if response.get(\"tool_calls\"):\n            tool_responses = await self._process_tool_calls(response[\"tool_calls\"])\n            \n            # Add tool responses to conversation history\n            for tool_response in tool_responses:\n                self.state.conversation_history.append(\n                    Message(\n                        role=MessageRole.TOOL,\n                        content=tool_response[\"content\"],\n                        tool_call_id=tool_response[\"tool_call_id\"]\n                    )\n                )\n            \n            # Generate a new response with tool results\n            final_response = await self.provider_service.generate_completion(\n                messages=[msg.model_dump() for msg in self.state.conversation_history],\n                tools=self.tools,\n                user=user_id\n            )\n            return final_response[\"message\"][\"content\"]\n        \n        return response[\"message\"][\"content\"]\n    \n    async def _retrieve_relevant_knowledge(self, query: str) -\u003e Optional[str]:\n        \"\"\"Retrieve relevant information from knowledge base.\"\"\"\n        try:\n            results = await self.knowledge_service.search(query, max_results=3)\n            \n            if not results:\n                return None\n                \n            # Format the results\n            formatted_results = \"\\n\\n\".join([\n                f\"Source: {result['title']}\\n\"\n                f\"Content: {result['content']}\\n\"\n                f\"Relevance: {result['relevance_score']}\"\n                for result in results\n            ])\n            \n            return formatted_results\n        except Exception as e:\n            logger.error(f\"Error retrieving knowledge: {str(e)}\")\n            return None\n    \n    async def _process_tool_calls(self, tool_calls: List[Dict[str, Any]]) -\u003e List[Dict[str, Any]]:\n        \"\"\"Process tool calls and return tool responses.\"\"\"\n        tool_responses = []\n        \n        for tool_call in tool_calls:\n            tool_name = tool_call[\"function\"][\"name\"]\n            tool_args = tool_call[\"function\"][\"arguments\"]\n            tool_call_id = tool_call[\"id\"]\n            \n            try:\n                if tool_name == \"search_knowledge_base\":\n                    results = await self.knowledge_service.search(\n                        query=tool_args[\"query\"],\n                        max_results=tool_args.get(\"max_results\", 3)\n                    )\n                    formatted_results = \"\\n\\n\".join([\n                        f\"Document ID: {result['id']}\\n\"\n                        f\"Title: {result['title']}\\n\"\n                        f\"Summary: {result['summary']}\"\n                        for result in results\n                    ])\n                    \n                    tool_responses.append({\n                        \"tool_call_id\": tool_call_id,\n                        \"content\": formatted_results or \"No results found.\"\n                    })\n                    \n                elif tool_name == \"retrieve_document\":\n                    document = await self.knowledge_service.retrieve_document(\n                        document_id=tool_args[\"document_id\"]\n                    )\n                    \n                    if document:\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": f\"Title: {document['title']}\\n\\n{document['content']}\"\n                        })\n                    else:\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": \"Document not found.\"\n                        })\n            except Exception as e:\n                logger.error(f\"Error processing tool call {tool_name}: {str(e)}\")\n                tool_responses.append({\n                    \"tool_call_id\": tool_call_id,\n                    \"content\": f\"Error processing tool call: {str(e)}\"\n                })\n        \n        return tool_responses\n"])</script><script>self.__next_f.push([1,"7a:[\"$\",\"pre\",\"pre-23\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2d6\"}]}]\n7b:[\"$\",\"h3\",\"h3-20\",{\"id\":\"conversational-flow-manager-agent\",\"children\":\"Conversational Flow Manager Agent\"}]\n2d7:T1f7e,"])</script><script>self.__next_f.push([1,"# app/agents/conversation_manager.py\nfrom typing import Dict, List, Any, Optional\nimport logging\nimport json\n\nfrom app.agents.base_agent import BaseAgent\nfrom app.models.message import Message, MessageRole\n\nlogger = logging.getLogger(__name__)\n\nclass ConversationState(BaseModel):\n    \"\"\"Tracks the state of a conversation.\"\"\"\n    current_topic: Optional[str] = None\n    topic_history: List[str] = Field(default_factory=list)\n    user_preferences: Dict[str, Any] = Field(default_factory=dict)\n    conversation_stage: str = \"opening\"  # opening, exploring, focusing, concluding\n    open_questions: List[str] = Field(default_factory=list)\n    satisfaction_score: Optional[float] = None\n\nclass ConversationManager(BaseAgent):\n    \"\"\"Agent specialized in managing conversation flow and context.\"\"\"\n    \n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.conversation_state = ConversationState()\n        \n        # Register conversation management tools\n        self.tools.extend([\n            {\n                \"type\": \"function\",\n                \"function\": {\n                    \"name\": \"update_conversation_state\",\n                    \"description\": \"Update the state of the conversation based on analysis\",\n                    \"parameters\": {\n                        \"type\": \"object\",\n                        \"properties\": {\n                            \"current_topic\": {\n                                \"type\": \"string\",\n                                \"description\": \"The current topic of conversation\"\n                            },\n                            \"conversation_stage\": {\n                                \"type\": \"string\",\n                                \"description\": \"The current stage of the conversation\",\n                                \"enum\": [\"opening\", \"exploring\", \"focusing\", \"concluding\"]\n                            },\n                            \"detected_preferences\": {\n                                \"type\": \"object\",\n                                \"description\": \"Preferences detected from the user\"\n                            },\n                            \"open_questions\": {\n                                \"type\": \"array\",\n                                \"items\": {\"type\": \"string\"},\n                                \"description\": \"Questions that remain unanswered\"\n                            },\n                            \"satisfaction_estimate\": {\n                                \"type\": \"number\",\n                                \"description\": \"Estimated user satisfaction (0-1)\"\n                            }\n                        }\n                    }\n                }\n            }\n        ])\n    \n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response with conversation flow management.\"\"\"\n        # First, analyze the conversation to update state\n        analysis_prompt = self._create_analysis_prompt()\n        \n        analysis_messages = [\n            {\"role\": \"system\", \"content\": analysis_prompt},\n            {\"role\": \"user\", \"content\": \"Analyze the following conversation and update the conversation state.\"},\n            {\"role\": \"user\", \"content\": self._format_conversation_history()}\n        ]\n        \n        analysis_response = await self.provider_service.generate_completion(\n            messages=analysis_messages,\n            tools=self.tools,\n            tool_choice={\"type\": \"function\", \"function\": {\"name\": \"update_conversation_state\"}},\n            user=user_id\n        )\n        \n        # Process conversation state update\n        if analysis_response.get(\"tool_calls\"):\n            tool_call = analysis_response[\"tool_calls\"][0]\n            if tool_call[\"function\"][\"name\"] == \"update_conversation_state\":\n                try:\n                    state_update = json.loads(tool_call[\"function\"][\"arguments\"])\n                    self._update_conversation_state(state_update)\n                except Exception as e:\n                    logger.error(f\"Error updating conversation state: {str(e)}\")\n        \n        # Now generate the actual response with enhanced context\n        enhanced_messages = self.state.conversation_history.copy()\n        \n        # Add conversation state as context\n        context_message = Message(\n            role=MessageRole.SYSTEM,\n            content=self._format_conversation_context()\n        )\n        enhanced_messages.insert(-1, context_message)\n        \n        response = await self.provider_service.generate_completion(\n            messages=[msg.model_dump() for msg in enhanced_messages],\n            user=user_id\n        )\n        \n        return response[\"message\"][\"content\"]\n    \n    def _create_analysis_prompt(self) -\u003e str:\n        \"\"\"Create a prompt for conversation analysis.\"\"\"\n        return \"\"\"\n        You are a conversation analysis expert. Your task is to analyze the conversation \n        and extract key information about the current state of the dialogue. \n        \n        Specifically, you should:\n        1. Identify the current main topic of conversation\n        2. Determine the stage of the conversation (opening, exploring, focusing, or concluding)\n        3. Detect user preferences and interests from their messages\n        4. Track open questions that haven't been fully addressed\n        5. Estimate user satisfaction based on their engagement and responses\n        \n        Use the update_conversation_state function to provide this analysis.\n        \"\"\"\n    \n    def _format_conversation_history(self) -\u003e str:\n        \"\"\"Format the conversation history for analysis.\"\"\"\n        formatted = []\n        \n        for msg in self.state.conversation_history:\n            if msg.role == MessageRole.SYSTEM:\n                continue\n            formatted.append(f\"{msg.role.value}: {msg.content}\")\n        \n        return \"\\n\\n\".join(formatted)\n    \n    def _update_conversation_state(self, update: Dict[str, Any]):\n        \"\"\"Update the conversation state with analysis results.\"\"\"\n        if \"current_topic\" in update and update[\"current_topic\"]:\n            if self.conversation_state.current_topic != update[\"current_topic\"]:\n                if self.conversation_state.current_topic:\n                    self.conversation_state.topic_history.append(\n                        self.conversation_state.current_topic\n                    )\n                self.conversation_state.current_topic = update[\"current_topic\"]\n        \n        if \"conversation_stage\" in update:\n            self.conversation_state.conversation_stage = update[\"conversation_stage\"]\n        \n        if \"detected_preferences\" in update:\n            for key, value in update[\"detected_preferences\"].items():\n                self.conversation_state.user_preferences[key] = value\n        \n        if \"open_questions\" in update:\n            self.conversation_state.open_questions = update[\"open_questions\"]\n        \n        if \"satisfaction_estimate\" in update:\n            self.conversation_state.satisfaction_score = update[\"satisfaction_estimate\"]\n    \n    def _format_conversation_context(self) -\u003e str:\n        \"\"\"Format the conversation state as context for response generation.\"\"\"\n        return f\"\"\"\n        Current conversation context:\n        - Topic: {self.conversation_state.current_topic or 'Not yet established'}\n        - Conversation stage: {self.conversation_state.conversation_stage}\n        - User preferences: {json.dumps(self.conversation_state.user_preferences, indent=2)}\n        - Open questions: {', '.join(self.conversation_state.open_questions) if self.conversation_state.open_questions else 'None'}\n        \n        Previous topics: {', '.join(self.conversation_state.topic_history) if self.conversation_state.topic_history else 'None'}\n        \n        Adapt your response to this conversation context. If in exploring stage, ask open-ended questions.\n        If in focusing stage, provide detailed information on the current topic. If in concluding stage,\n        summarize key points and check if the user needs anything else.\n        \"\"\"\n"])</script><script>self.__next_f.push([1,"7c:[\"$\",\"pre\",\"pre-24\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2d7\"}]}]\n7d:[\"$\",\"h3\",\"h3-21\",{\"id\":\"memory-enhanced-contextual-agent\",\"children\":\"Memory-Enhanced Contextual Agent\"}]\n2d8:T2b63,"])</script><script>self.__next_f.push([1,"# app/agents/contextual_agent.py\nfrom typing import List, Dict, Any, Optional, Tuple\nimport logging\nimport time\nfrom datetime import datetime\n\nfrom app.agents.base_agent import BaseAgent\nfrom app.services.memory_service import MemoryService\nfrom app.models.message import Message, MessageRole\n\nlogger = logging.getLogger(__name__)\n\nclass ContextualAgent(BaseAgent):\n    \"\"\"Agent with enhanced contextual awareness and memory capabilities.\"\"\"\n    \n    def __init__(self, *args, memory_service: MemoryService, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.memory_service = memory_service\n        \n        # Initialize memory collections\n        self.episodic_memory = []  # Stores specific interactions/events\n        self.semantic_memory = {}  # Stores facts and knowledge\n        self.working_memory = []   # Currently active context\n        \n        self.max_working_memory = 10  # Max items in working memory\n    \n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response with contextual memory enhancement.\"\"\"\n        # Update memories based on recent conversation\n        await self._update_memories(user_id)\n        \n        # Retrieve relevant memories for current context\n        relevant_memories = await self._retrieve_relevant_memories(user_id)\n        \n        # Create context-enhanced prompt\n        context_message = Message(\n            role=MessageRole.SYSTEM,\n            content=self._create_context_prompt(relevant_memories)\n        )\n        \n        # Insert context before the last user message\n        enhanced_history = self.state.conversation_history.copy()\n        user_message_index = next(\n            (i for i, msg in enumerate(reversed(enhanced_history)) \n             if msg.role == MessageRole.USER),\n            None\n        )\n        if user_message_index is not None:\n            user_message_index = len(enhanced_history) - 1 - user_message_index\n            enhanced_history.insert(user_message_index, context_message)\n        \n        # Generate response\n        response = await self.provider_service.generate_completion(\n            messages=[msg.model_dump() for msg in enhanced_history],\n            tools=self.tools,\n            user=user_id\n        )\n        \n        # Process memory-related tool calls if any\n        if response.get(\"tool_calls\"):\n            memory_updates = await self._process_memory_tools(response[\"tool_calls\"])\n            if memory_updates:\n                # If memory was updated, we might want to regenerate with new context\n                return await self._generate_response(user_id)\n        \n        # Update working memory with the response\n        if response[\"message\"][\"content\"]:\n            self.working_memory.append({\n                \"type\": \"assistant_response\",\n                \"content\": response[\"message\"][\"content\"],\n                \"timestamp\": time.time()\n            })\n            self._prune_working_memory()\n        \n        return response[\"message\"][\"content\"]\n    \n    async def _update_memories(self, user_id: str):\n        \"\"\"Update the agent's memories based on recent conversation.\"\"\"\n        # Get last user message\n        last_user_message = next(\n            (msg for msg in reversed(self.state.conversation_history) \n             if msg.role == MessageRole.USER),\n            None\n        )\n        \n        if not last_user_message:\n            return\n        \n        # Add to working memory\n        self.working_memory.append({\n            \"type\": \"user_message\",\n            \"content\": last_user_message.content,\n            \"timestamp\": time.time()\n        })\n        \n        # Extract potential semantic memories (facts, preferences)\n        if len(self.state.conversation_history) \u003e 2:\n            extraction_messages = [\n                {\"role\": \"system\", \"content\": \"Extract key facts, preferences, or personal details from this user message that would be useful to remember for future interactions. Return in JSON format with keys: 'facts', 'preferences', 'personal_details', each containing an array of strings.\"},\n                {\"role\": \"user\", \"content\": last_user_message.content}\n            ]\n            \n            try:\n                extraction = await self.provider_service.generate_completion(\n                    messages=extraction_messages,\n                    user=user_id,\n                    response_format={\"type\": \"json_object\"}\n                )\n                \n                content = extraction[\"message\"][\"content\"]\n                if content:\n                    import json\n                    memory_data = json.loads(content)\n                    \n                    # Store in semantic memory\n                    timestamp = datetime.now().isoformat()\n                    for category, items in memory_data.items():\n                        if not isinstance(items, list):\n                            continue\n                        for item in items:\n                            if not item or not isinstance(item, str):\n                                continue\n                            memory_key = f\"{category}:{self._generate_memory_key(item)}\"\n                            self.semantic_memory[memory_key] = {\n                                \"content\": item,\n                                \"category\": category,\n                                \"last_accessed\": timestamp,\n                                \"created_at\": timestamp,\n                                \"importance\": self._calculate_importance(item)\n                            }\n                    \n                    # Store in memory service for persistence\n                    await self.memory_service.store_memories(\n                        user_id=user_id,\n                        memories=self.semantic_memory\n                    )\n            except Exception as e:\n                logger.error(f\"Error extracting memories: {str(e)}\")\n        \n        # Prune working memory if needed\n        self._prune_working_memory()\n    \n    async def _retrieve_relevant_memories(self, user_id: str) -\u003e Dict[str, List[Any]]:\n        \"\"\"Retrieve memories relevant to the current context.\"\"\"\n        # Get conversation summary or last few messages\n        if len(self.state.conversation_history) \u003c= 2:\n            query = self.state.conversation_history[-1].content\n        else:\n            recent_messages = self.state.conversation_history[-3:]\n            query = \" \".join([msg.content for msg in recent_messages if msg.role != MessageRole.SYSTEM])\n        \n        # Retrieve from memory service\n        stored_memories = await self.memory_service.retrieve_memories(\n            user_id=user_id,\n            query=query,\n            limit=5\n        )\n        \n        # Combine with local semantic memory\n        all_memories = {\n            \"facts\": [],\n            \"preferences\": [],\n            \"personal_details\": [],\n            \"episodic\": self.episodic_memory[-3:] if self.episodic_memory else []\n        }\n        \n        # Add from semantic memory\n        for key, memory in self.semantic_memory.items():\n            category = memory[\"category\"]\n            if category in all_memories and len(all_memories[category]) \u003c 5:\n                all_memories[category].append(memory[\"content\"])\n        \n        # Add from stored memories\n        for memory in stored_memories:\n            category = memory.get(\"category\", \"facts\")\n            if category in all_memories and len(all_memories[category]) \u003c 5:\n                all_memories[category].append(memory[\"content\"])\n                \n                # Update last accessed\n                if memory.get(\"id\"):\n                    memory_key = f\"{category}:{memory['id']}\"\n                    if memory_key in self.semantic_memory:\n                        self.semantic_memory[memory_key][\"last_accessed\"] = datetime.now().isoformat()\n        \n        return all_memories\n    \n    def _create_context_prompt(self, memories: Dict[str, List[Any]]) -\u003e str:\n        \"\"\"Create a context prompt with relevant memories.\"\"\"\n        context_parts = [\"Additional context to consider:\"]\n        \n        if memories[\"facts\"]:\n            facts = \"\\n\".join([f\"- {fact}\" for fact in memories[\"facts\"]])\n            context_parts.append(f\"Facts about the user or relevant topics:\\n{facts}\")\n        \n        if memories[\"preferences\"]:\n            prefs = \"\\n\".join([f\"- {pref}\" for pref in memories[\"preferences\"]])\n            context_parts.append(f\"User preferences:\\n{prefs}\")\n        \n        if memories[\"personal_details\"]:\n            details = \"\\n\".join([f\"- {detail}\" for detail in memories[\"personal_details\"]])\n            context_parts.append(f\"Personal details:\\n{details}\")\n        \n        if memories[\"episodic\"]:\n            episodes = \"\\n\".join([f\"- {ep.get('summary', '')}\" for ep in memories[\"episodic\"]])\n            context_parts.append(f\"Recent interactions:\\n{episodes}\")\n        \n        # Add working memory summary\n        if self.working_memory:\n            working_context = \"Current context:\\n\"\n            for item in self.working_memory[-5:]:\n                item_type = item[\"type\"]\n                content_preview = item[\"content\"][:100] + \"...\" if len(item[\"content\"]) \u003e 100 else item[\"content\"]\n                working_context += f\"- [{item_type}] {content_preview}\\n\"\n            context_parts.append(working_context)\n        \n        context_parts.append(\"Use this information to personalize your response, but don't explicitly mention that you're using saved information unless directly relevant.\")\n        \n        return \"\\n\\n\".join(context_parts)\n    \n    def _prune_working_memory(self):\n        \"\"\"Prune working memory to stay within limits.\"\"\"\n        if len(self.working_memory) \u003e self.max_working_memory:\n            # Instead of simple truncation, we prioritize by recency and importance\n            self.working_memory.sort(key=lambda x: (x.get(\"importance\", 0.5), x[\"timestamp\"]), reverse=True)\n            self.working_memory = self.working_memory[:self.max_working_memory]\n    \n    def _generate_memory_key(self, content: str) -\u003e str:\n        \"\"\"Generate a unique key for memory storage.\"\"\"\n        import hashlib\n        return hashlib.md5(content.encode()).hexdigest()[:10]\n    \n    def _calculate_importance(self, content: str) -\u003e float:\n        \"\"\"Calculate the importance score of a memory item.\"\"\"\n        # Simple heuristic based on content length and presence of certain keywords\n        importance_keywords = [\"always\", \"never\", \"hate\", \"love\", \"favorite\", \"important\", \"must\", \"need\"]\n        \n        base_score = min(len(content) / 100, 0.5)  # Longer items get higher base score, up to 0.5\n        \n        keyword_score = sum(0.1 for word in importance_keywords if word in content.lower()) \n        keyword_score = min(keyword_score, 0.5)  # Cap at 0.5\n        \n        return base_score + keyword_score\n    \n    async def _process_memory_tools(self, tool_calls: List[Dict[str, Any]]) -\u003e bool:\n        \"\"\"Process memory-related tool calls.\"\"\"\n        # Implement if we add memory-specific tools\n        return False\n"])</script><script>self.__next_f.push([1,"7e:[\"$\",\"pre\",\"pre-25\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2d8\"}]}]\n7f:[\"$\",\"h2\",\"h2-25\",{\"id\":\"advanced-tool-integration\",\"children\":\"Advanced Tool Integration\"}]\n80:[\"$\",\"h3\",\"h3-22\",{\"id\":\"collaborative-task-management-agent\",\"children\":\"Collaborative Task Management Agent\"}]\n2d9:T33d4,"])</script><script>self.__next_f.push([1,"# app/agents/task_agent.py\nfrom typing import List, Dict, Any, Optional\nimport logging\nimport json\nimport asyncio\n\nfrom app.agents.base_agent import BaseAgent\nfrom app.models.message import Message, MessageRole\nfrom app.models.tool import Tool\nfrom app.services.task_service import TaskService\n\nlogger = logging.getLogger(__name__)\n\nclass TaskManagementAgent(BaseAgent):\n    \"\"\"Agent specialized in collaborative task management.\"\"\"\n    \n    def __init__(self, *args, task_service: TaskService, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.task_service = task_service\n        \n        # Register task management tools\n        self.tools.extend([\n            Tool(\n                name=\"list_tasks\",\n                description=\"List tasks for the user\",\n                parameters={\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"status\": {\n                            \"type\": \"string\",\n                            \"enum\": [\"pending\", \"in_progress\", \"completed\", \"all\"],\n                            \"description\": \"Filter tasks by status\"\n                        },\n                        \"limit\": {\n                            \"type\": \"integer\",\n                            \"description\": \"Maximum number of tasks to return\",\n                            \"default\": 10\n                        }\n                    }\n                }\n            ),\n            Tool(\n                name=\"create_task\",\n                description=\"Create a new task\",\n                parameters={\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"title\": {\n                            \"type\": \"string\",\n                            \"description\": \"Title of the task\"\n                        },\n                        \"description\": {\n                            \"type\": \"string\",\n                            \"description\": \"Detailed description of the task\"\n                        },\n                        \"due_date\": {\n                            \"type\": \"string\",\n                            \"description\": \"Due date in ISO format (YYYY-MM-DD)\"\n                        },\n                        \"priority\": {\n                            \"type\": \"string\",\n                            \"enum\": [\"low\", \"medium\", \"high\"],\n                            \"description\": \"Priority level of the task\"\n                        }\n                    },\n                    \"required\": [\"title\"]\n                }\n            ),\n            Tool(\n                name=\"update_task\",\n                description=\"Update an existing task\",\n                parameters={\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"task_id\": {\n                            \"type\": \"string\",\n                            \"description\": \"ID of the task to update\"\n                        },\n                        \"title\": {\n                            \"type\": \"string\",\n                            \"description\": \"New title of the task\"\n                        },\n                        \"description\": {\n                            \"type\": \"string\",\n                            \"description\": \"New description of the task\"\n                        },\n                        \"status\": {\n                            \"type\": \"string\",\n                            \"enum\": [\"pending\", \"in_progress\", \"completed\"],\n                            \"description\": \"New status of the task\"\n                        },\n                        \"due_date\": {\n                            \"type\": \"string\",\n                            \"description\": \"New due date in ISO format (YYYY-MM-DD)\"\n                        },\n                        \"priority\": {\n                            \"type\": \"string\",\n                            \"enum\": [\"low\", \"medium\", \"high\"],\n                            \"description\": \"New priority level of the task\"\n                        }\n                    },\n                    \"required\": [\"task_id\"]\n                }\n            ),\n            Tool(\n                name=\"delete_task\",\n                description=\"Delete a task\",\n                parameters={\n                    \"type\": \"object\",\n                    \"properties\": {\n                        \"task_id\": {\n                            \"type\": \"string\",\n                            \"description\": \"ID of the task to delete\"\n                        },\n                        \"confirm\": {\n                            \"type\": \"boolean\",\n                            \"description\": \"Confirmation to delete the task\",\n                            \"default\": False\n                        }\n                    },\n                    \"required\": [\"task_id\", \"confirm\"]\n                }\n            )\n        ])\n    \n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response with task management capabilities.\"\"\"\n        # Prepare messages for completion\n        messages = [msg.model_dump() for msg in self.state.conversation_history]\n        \n        # Generate initial response\n        response = await self.provider_service.generate_completion(\n            messages=messages,\n            tools=self.tools,\n            user=user_id\n        )\n        \n        # Process tool calls if any\n        if response.get(\"tool_calls\"):\n            tool_responses = await self._process_tool_calls(response[\"tool_calls\"], user_id)\n            \n            # Add tool responses to conversation history\n            for tool_response in tool_responses:\n                self.state.conversation_history.append(\n                    Message(\n                        role=MessageRole.TOOL,\n                        content=tool_response[\"content\"],\n                        tool_call_id=tool_response[\"tool_call_id\"]\n                    )\n                )\n            \n            # Generate new response with tool results\n            updated_messages = [msg.model_dump() for msg in self.state.conversation_history]\n            final_response = await self.provider_service.generate_completion(\n                messages=updated_messages,\n                tools=self.tools,\n                user=user_id\n            )\n            \n            # Handle any additional tool calls (recursive)\n            if final_response.get(\"tool_calls\"):\n                # For simplicity, we'll limit to one level of recursion\n                return await self._handle_recursive_tool_calls(final_response, user_id)\n            \n            return final_response[\"message\"][\"content\"]\n        \n        return response[\"message\"][\"content\"]\n    \n    async def _handle_recursive_tool_calls(self, response: Dict[str, Any], user_id: str) -\u003e str:\n        \"\"\"Handle additional tool calls recursively.\"\"\"\n        tool_responses = await self._process_tool_calls(response[\"tool_calls\"], user_id)\n        \n        # Add tool responses to conversation history\n        for tool_response in tool_responses:\n            self.state.conversation_history.append(\n                Message(\n                    role=MessageRole.TOOL,\n                    content=tool_response[\"content\"],\n                    tool_call_id=tool_response[\"tool_call_id\"]\n                )\n            )\n        \n        # Generate final response with all tool results\n        updated_messages = [msg.model_dump() for msg in self.state.conversation_history]\n        final_response = await self.provider_service.generate_completion(\n            messages=updated_messages,\n            tools=self.tools,\n            user=user_id\n        )\n        \n        return final_response[\"message\"][\"content\"]\n    \n    async def _process_tool_calls(self, tool_calls: List[Dict[str, Any]], user_id: str) -\u003e List[Dict[str, Any]]:\n        \"\"\"Process tool calls and return tool responses.\"\"\"\n        tool_responses = []\n        \n        for tool_call in tool_calls:\n            tool_name = tool_call[\"function\"][\"name\"]\n            tool_args_json = tool_call[\"function\"][\"arguments\"]\n            tool_call_id = tool_call[\"id\"]\n            \n            try:\n                # Parse arguments as JSON\n                tool_args = json.loads(tool_args_json)\n                \n                # Process based on tool name\n                if tool_name == \"list_tasks\":\n                    result = await self.task_service.list_tasks(\n                        user_id=user_id,\n                        status=tool_args.get(\"status\", \"all\"),\n                        limit=tool_args.get(\"limit\", 10)\n                    )\n                    \n                    if result:\n                        tasks_formatted = \"\\n\\n\".join([\n                            f\"ID: {task['id']}\\n\"\n                            f\"Title: {task['title']}\\n\"\n                            f\"Status: {task['status']}\\n\"\n                            f\"Priority: {task['priority']}\\n\"\n                            f\"Due Date: {task['due_date']}\\n\"\n                            f\"Description: {task['description']}\"\n                            for task in result\n                        ])\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": f\"Found {len(result)} tasks:\\n\\n{tasks_formatted}\"\n                        })\n                    else:\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": \"No tasks found matching your criteria.\"\n                        })\n                \n                elif tool_name == \"create_task\":\n                    result = await self.task_service.create_task(\n                        user_id=user_id,\n                        title=tool_args[\"title\"],\n                        description=tool_args.get(\"description\", \"\"),\n                        due_date=tool_args.get(\"due_date\"),\n                        priority=tool_args.get(\"priority\", \"medium\")\n                    )\n                    \n                    tool_responses.append({\n                        \"tool_call_id\": tool_call_id,\n                        \"content\": f\"Task created successfully.\\n\\nID: {result['id']}\\nTitle: {result['title']}\"\n                    })\n                \n                elif tool_name == \"update_task\":\n                    update_data = {k: v for k, v in tool_args.items() if k != \"task_id\"}\n                    result = await self.task_service.update_task(\n                        user_id=user_id,\n                        task_id=tool_args[\"task_id\"],\n                        **update_data\n                    )\n                    \n                    if result:\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": f\"Task updated successfully.\\n\\nID: {result['id']}\\nTitle: {result['title']}\\nStatus: {result['status']}\"\n                        })\n                    else:\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": f\"Task with ID {tool_args['task_id']} not found or you don't have permission to update it.\"\n                        })\n                \n                elif tool_name == \"delete_task\":\n                    if not tool_args.get(\"confirm\", False):\n                        tool_responses.append({\n                            \"tool_call_id\": tool_call_id,\n                            \"content\": \"Task deletion requires confirmation. Please set 'confirm' to true to proceed.\"\n                        })\n                    else:\n                        result = await self.task_service.delete_task(\n                            user_id=user_id,\n                            task_id=tool_args[\"task_id\"]\n                        )\n                        \n                        if result:\n                            tool_responses.append({\n                                \"tool_call_id\": tool_call_id,\n                                \"content\": f\"Task with ID {tool_args['task_id']} has been deleted successfully.\"\n                            })\n                        else:\n                            tool_responses.append({\n                                \"tool_call_id\": tool_call_id,\n                                \"content\": f\"Task with ID {tool_args['task_id']} not found or you don't have permission to delete it.\"\n                            })\n            \n            except json.JSONDecodeError:\n                tool_responses.append({\n                    \"tool_call_id\": tool_call_id,\n                    \"content\": \"Error: Invalid JSON in tool arguments.\"\n                })\n            except KeyError as e:\n                tool_responses.append({\n                    \"tool_call_id\": tool_call_id,\n                    \"content\": f\"Error: Missing required parameter: {str(e)}\"\n                })\n            except Exception as e:\n                logger.error(f\"Error processing tool call {tool_name}: {str(e)}\")\n                tool_responses.append({\n                    \"tool_call_id\": tool_call_id,\n                    \"content\": f\"Error executing {tool_name}: {str(e)}\"\n                })\n        \n        return tool_responses\n"])</script><script>self.__next_f.push([1,"81:[\"$\",\"pre\",\"pre-26\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2d9\"}]}]\n82:[\"$\",\"h2\",\"h2-26\",{\"id\":\"agent-factory-and-orchestration\",\"children\":\"Agent Factory and Orchestration\"}]\n2da:Tb2d,"])</script><script>self.__next_f.push([1,"# app/agents/agent_factory.py\nfrom typing import Dict, Any, Optional, List, Type\nimport logging\n\nfrom app.agents.base_agent import BaseAgent\nfrom app.agents.research_agent import ResearchAgent\nfrom app.agents.conversation_manager import ConversationManager\nfrom app.agents.contextual_agent import ContextualAgent\nfrom app.agents.task_agent import TaskManagementAgent\n\nfrom app.services.provider_service import ProviderService\nfrom app.services.knowledge_service import KnowledgeService\nfrom app.services.memory_service import MemoryService\nfrom app.services.task_service import TaskService\n\nlogger = logging.getLogger(__name__)\n\nclass AgentFactory:\n    \"\"\"Factory for creating agent instances based on requirements.\"\"\"\n    \n    def __init__(self, \n                 provider_service: ProviderService,\n                 knowledge_service: Optional[KnowledgeService] = None,\n                 memory_service: Optional[MemoryService] = None,\n                 task_service: Optional[TaskService] = None):\n        self.provider_service = provider_service\n        self.knowledge_service = knowledge_service\n        self.memory_service = memory_service\n        self.task_service = task_service\n        \n        # Register available agent types\n        self.agent_types: Dict[str, Type[BaseAgent]] = {\n            \"research\": ResearchAgent,\n            \"conversation\": ConversationManager,\n            \"contextual\": ContextualAgent,\n            \"task\": TaskManagementAgent\n        }\n    \n    def create_agent(self, \n                    agent_type: str, \n                    system_prompt: str, \n                    tools: Optional[List[Dict[str, Any]]] = None,\n                    **kwargs) -\u003e BaseAgent:\n        \"\"\"Create and return an agent instance of the specified type.\"\"\"\n        if agent_type not in self.agent_types:\n            raise ValueError(f\"Unknown agent type: {agent_type}. Available types: {list(self.agent_types.keys())}\")\n        \n        agent_class = self.agent_types[agent_type]\n        \n        # Prepare required services based on agent type\n        agent_kwargs = {\n            \"provider_service\": self.provider_service,\n            \"system_prompt\": system_prompt,\n            \"tools\": tools\n        }\n        \n        # Add specialized services based on agent type\n        if agent_type == \"research\" and self.knowledge_service:\n            agent_kwargs[\"knowledge_service\"] = self.knowledge_service\n        \n        if agent_type == \"contextual\" and self.memory_service:\n            agent_kwargs[\"memory_service\"] = self.memory_service\n            \n        if agent_type == \"task\" and self.task_service:\n            agent_kwargs[\"task_service\"] = self.task_service\n        \n        # Add any additional kwargs\n        agent_kwargs.update(kwargs)\n        \n        # Create and return the agent instance\n        return agent_class(**agent_kwargs)\n"])</script><script>self.__next_f.push([1,"83:[\"$\",\"pre\",\"pre-27\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2da\"}]}]\n84:[\"$\",\"h2\",\"h2-27\",{\"id\":\"metaframework-for-agent-composition\",\"children\":\"Metaframework for Agent Composition\"}]\n2db:T27f7,"])</script><script>self.__next_f.push([1,"# app/agents/meta_agent.py\nfrom typing import Dict, List, Any, Optional\nimport logging\nimport asyncio\nimport json\n\nfrom app.agents.base_agent import BaseAgent, AgentState\nfrom app.models.message import Message, MessageRole\nfrom app.services.provider_service import ProviderService\n\nlogger = logging.getLogger(__name__)\n\nclass AgentSubsystem:\n    \"\"\"Represents a specialized agent within the MetaAgent.\"\"\"\n    \n    def __init__(self, name: str, agent: BaseAgent, role: str):\n        self.name = name\n        self.agent = agent\n        self.role = role\n        self.active = True\n\nclass MetaAgent(BaseAgent):\n    \"\"\"A meta-agent that coordinates multiple specialized agents.\"\"\"\n    \n    def __init__(self, \n                 provider_service: ProviderService,\n                 system_prompt: str,\n                 subsystems: Optional[List[AgentSubsystem]] = None,\n                 state: Optional[AgentState] = None):\n        super().__init__(provider_service, system_prompt, [], state)\n        self.subsystems = subsystems or []\n        \n        # Tools specific to the meta-agent\n        self.tools.extend([\n            {\n                \"type\": \"function\",\n                \"function\": {\n                    \"name\": \"route_to_subsystem\",\n                    \"description\": \"Route a task to a specific subsystem agent\",\n                    \"parameters\": {\n                        \"type\": \"object\",\n                        \"properties\": {\n                            \"subsystem\": {\n                                \"type\": \"string\",\n                                \"description\": \"The name of the subsystem to route to\"\n                            },\n                            \"task\": {\n                                \"type\": \"string\",\n                                \"description\": \"The task to be performed by the subsystem\"\n                            },\n                            \"context\": {\n                                \"type\": \"object\",\n                                \"description\": \"Additional context for the subsystem\"\n                            }\n                        },\n                        \"required\": [\"subsystem\", \"task\"]\n                    }\n                }\n            },\n            {\n                \"type\": \"function\",\n                \"function\": {\n                    \"name\": \"parallel_processing\",\n                    \"description\": \"Process a task in parallel across multiple subsystems\",\n                    \"parameters\": {\n                        \"type\": \"object\",\n                        \"properties\": {\n                            \"task\": {\n                                \"type\": \"string\",\n                                \"description\": \"The task to process in parallel\"\n                            },\n                            \"subsystems\": {\n                                \"type\": \"array\",\n                                \"items\": {\n                                    \"type\": \"string\"\n                                },\n                                \"description\": \"List of subsystems to involve\"\n                            }\n                        },\n                        \"required\": [\"task\", \"subsystems\"]\n                    }\n                }\n            }\n        ])\n    \n    def add_subsystem(self, subsystem: AgentSubsystem):\n        \"\"\"Add a new subsystem to the meta-agent.\"\"\"\n        # Check for duplicate names\n        if any(sys.name == subsystem.name for sys in self.subsystems):\n            raise ValueError(f\"Subsystem with name '{subsystem.name}' already exists\")\n        \n        self.subsystems.append(subsystem)\n    \n    def get_subsystem(self, name: str) -\u003e Optional[AgentSubsystem]:\n        \"\"\"Get a subsystem by name.\"\"\"\n        for subsystem in self.subsystems:\n            if subsystem.name == name:\n                return subsystem\n        return None\n    \n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response using the meta-agent architecture.\"\"\"\n        # Extract the last user message\n        last_user_message = next(\n            (msg for msg in reversed(self.state.conversation_history) \n             if msg.role == MessageRole.USER),\n            None\n        )\n        \n        if not last_user_message:\n            return \"I don't have any messages to respond to.\"\n        \n        # First, determine routing strategy using the coordinator\n        coordinator_messages = [\n            {\"role\": \"system\", \"content\": f\"\"\"\n            You are the coordinator of a multi-agent system with the following subsystems:\n            \n            {self._format_subsystems()}\n            \n            Your job is to analyze the user's message and determine the optimal processing strategy:\n            1. If the query is best handled by a single specialized subsystem, use route_to_subsystem\n            2. If the query would benefit from multiple perspectives, use parallel_processing\n            \n            Choose the most appropriate strategy based on the complexity and nature of the request.\n            \"\"\"},\n            {\"role\": \"user\", \"content\": last_user_message.content}\n        ]\n        \n        routing_response = await self.provider_service.generate_completion(\n            messages=coordinator_messages,\n            tools=self.tools,\n            tool_choice=\"auto\",\n            user=user_id\n        )\n        \n        # Process based on the routing decision\n        if routing_response.get(\"tool_calls\"):\n            tool_call = routing_response[\"tool_calls\"][0]\n            function_name = tool_call[\"function\"][\"name\"]\n            \n            try:\n                function_args = json.loads(tool_call[\"function\"][\"arguments\"])\n                \n                if function_name == \"route_to_subsystem\":\n                    return await self._handle_single_subsystem_route(\n                        function_args[\"subsystem\"],\n                        function_args[\"task\"],\n                        function_args.get(\"context\", {}),\n                        user_id\n                    )\n                \n                elif function_name == \"parallel_processing\":\n                    return await self._handle_parallel_processing(\n                        function_args[\"task\"],\n                        function_args[\"subsystems\"],\n                        user_id\n                    )\n            \n            except json.JSONDecodeError:\n                logger.error(\"Error parsing function arguments\")\n            except KeyError as e:\n                logger.error(f\"Missing required parameter: {e}\")\n            except Exception as e:\n                logger.error(f\"Error in routing: {e}\")\n        \n        # Fallback to direct response\n        return await self._handle_direct_response(user_id)\n    \n    async def _handle_single_subsystem_route(self, \n                                           subsystem_name: str, \n                                           task: str,\n                                           context: Dict[str, Any],\n                                           user_id: str) -\u003e str:\n        \"\"\"Handle routing to a single subsystem.\"\"\"\n        subsystem = self.get_subsystem(subsystem_name)\n        \n        if not subsystem or not subsystem.active:\n            return f\"Error: Subsystem '{subsystem_name}' not found or not active. Please try a different approach.\"\n        \n        # Process with the selected subsystem\n        response = await subsystem.agent.process_message(task, user_id)\n        \n        # Format the response to indicate the source\n        return f\"[{subsystem.name} - {subsystem.role}] {response}\"\n    \n    async def _handle_parallel_processing(self,\n                                        task: str,\n                                        subsystem_names: List[str],\n                                        user_id: str) -\u003e str:\n        \"\"\"Handle parallel processing across multiple subsystems.\"\"\"\n        # Validate subsystems\n        valid_subsystems = []\n        for name in subsystem_names:\n            subsystem = self.get_subsystem(name)\n            if subsystem and subsystem.active:\n                valid_subsystems.append(subsystem)\n        \n        if not valid_subsystems:\n            return \"Error: None of the specified subsystems are available.\"\n        \n        # Process in parallel\n        tasks = [subsystem.agent.process_message(task, user_id) for subsystem in valid_subsystems]\n        responses = await asyncio.gather(*tasks)\n        \n        # Format responses\n        formatted_responses = [\n            f\"## {subsystem.name} ({subsystem.role}):\\n{response}\"\n            for subsystem, response in zip(valid_subsystems, responses)\n        ]\n        \n        # Synthesize a final response\n        synthesis_prompt = f\"\"\"\n        The user's request was processed by multiple specialized agents:\n        \n        {\"\".join(formatted_responses)}\n        \n        Synthesize a comprehensive response that incorporates these perspectives.\n        Highlight areas of agreement and provide a balanced view where there are differences.\n        \"\"\"\n        \n        synthesis_messages = [\n            {\"role\": \"system\", \"content\": \"You are a synthesis agent that combines multiple specialized perspectives into a coherent response.\"},\n            {\"role\": \"user\", \"content\": synthesis_prompt}\n        ]\n        \n        synthesis = await self.provider_service.generate_completion(\n            messages=synthesis_messages,\n            user=user_id\n        )\n        \n        return synthesis[\"message\"][\"content\"]\n    \n    async def _handle_direct_response(self, user_id: str) -\u003e str:\n        \"\"\"Handle direct response when no routing is determined.\"\"\"\n        # Generate a response directly using the provider service\n        response = await self.provider_service.generate_completion(\n            messages=[msg.model_dump() for msg in self.state.conversation_history],\n            user=user_id\n        )\n        \n        return response[\"message\"][\"content\"]\n    \n    def _format_subsystems(self) -\u003e str:\n        \"\"\"Format subsystem information for the coordinator prompt.\"\"\"\n        return \"\\n\".join([\n            f\"- {subsystem.name}: {subsystem.role}\" \n            for subsystem in self.subsystems if subsystem.active\n        ])\n"])</script><script>self.__next_f.push([1,"85:[\"$\",\"pre\",\"pre-28\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2db\"}]}]\n86:[\"$\",\"h2\",\"h2-28\",{\"id\":\"sample-agent-usage-implementation\",\"children\":\"Sample Agent Usage Implementation\"}]\n2dc:T1605,"])</script><script>self.__next_f.push([1,"# app/main.py\nimport asyncio\nimport logging\nfrom fastapi import FastAPI, HTTPException, Depends, Header\nfrom pydantic import BaseModel\nfrom typing import List, Optional, Dict, Any\n\nfrom app.agents.agent_factory import AgentFactory\nfrom app.agents.meta_agent import MetaAgent, AgentSubsystem\nfrom app.services.provider_service import ProviderService\nfrom app.services.knowledge_service import KnowledgeService\nfrom app.services.memory_service import MemoryService\nfrom app.services.task_service import TaskService\n\n# Configure logging\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\napp = FastAPI(title=\"MCP Agent System\")\n\n# Initialize services\nprovider_service = ProviderService()\nknowledge_service = KnowledgeService()\nmemory_service = MemoryService()\ntask_service = TaskService()\n\n# Initialize agent factory\nagent_factory = AgentFactory(\n    provider_service=provider_service,\n    knowledge_service=knowledge_service,\n    memory_service=memory_service,\n    task_service=task_service\n)\n\n# Agent session storage\nagent_sessions = {}\n\n# Define request/response models\nclass MessageRequest(BaseModel):\n    message: str\n    session_id: Optional[str] = None\n    agent_type: Optional[str] = None\n\nclass MessageResponse(BaseModel):\n    response: str\n    session_id: str\n\n# Auth dependency\nasync def verify_api_key(authorization: Optional[str] = Header(None)):\n    if not authorization or not authorization.startswith(\"Bearer \"):\n        raise HTTPException(status_code=401, detail=\"Invalid or missing API key\")\n    \n    # Simple validation for demo purposes\n    token = authorization.replace(\"Bearer \", \"\")\n    if token != \"demo_api_key\":  # In production, validate against secure storage\n        raise HTTPException(status_code=401, detail=\"Invalid API key\")\n    \n    return token\n\n# Routes\n@app.post(\"/api/v1/chat\", response_model=MessageResponse)\nasync def chat(\n    request: MessageRequest,\n    api_key: str = Depends(verify_api_key)\n):\n    user_id = \"demo_user\"  # In production, extract from API key or auth token\n    \n    # Create or retrieve session\n    session_id = request.session_id\n    if not session_id or session_id not in agent_sessions:\n        # Create a new agent instance if session doesn't exist\n        session_id = f\"session_{len(agent_sessions) + 1}\"\n        \n        # Determine agent type\n        agent_type = request.agent_type or \"meta\"\n        \n        if agent_type == \"meta\":\n            # Create a meta-agent with multiple specialized subsystems\n            research_agent = agent_factory.create_agent(\n                agent_type=\"research\",\n                system_prompt=\"You are a research specialist that provides in-depth, accurate information based on available knowledge.\"\n            )\n            \n            conversation_agent = agent_factory.create_agent(\n                agent_type=\"conversation\",\n                system_prompt=\"You are a conversation expert that helps maintain engaging, relevant, and structured discussions.\"\n            )\n            \n            task_agent = agent_factory.create_agent(\n                agent_type=\"task\",\n                system_prompt=\"You are a task management specialist that helps organize, track, and complete tasks efficiently.\"\n            )\n            \n            meta_agent = MetaAgent(\n                provider_service=provider_service,\n                system_prompt=\"You are an advanced assistant that coordinates multiple specialized systems to provide optimal responses.\"\n            )\n            \n            # Add subsystems to meta-agent\n            meta_agent.add_subsystem(AgentSubsystem(\n                name=\"research\",\n                agent=research_agent,\n                role=\"Knowledge and information retrieval specialist\"\n            ))\n            \n            meta_agent.add_subsystem(AgentSubsystem(\n                name=\"conversation\",\n                agent=conversation_agent,\n                role=\"Conversation flow and engagement specialist\"\n            ))\n            \n            meta_agent.add_subsystem(AgentSubsystem(\n                name=\"task\",\n                agent=task_agent,\n                role=\"Task management and organization specialist\"\n            ))\n            \n            agent = meta_agent\n        else:\n            # Create a specialized agent\n            agent = agent_factory.create_agent(\n                agent_type=agent_type,\n                system_prompt=f\"You are a helpful assistant specializing in {agent_type} tasks.\"\n            )\n        \n        agent_sessions[session_id] = agent\n    else:\n        agent = agent_sessions[session_id]\n    \n    # Process the message\n    try:\n        response = await agent.process_message(request.message, user_id)\n        return MessageResponse(response=response, session_id=session_id)\n    except Exception as e:\n        logger.exception(\"Error processing message\")\n        raise HTTPException(status_code=500, detail=f\"Error processing message: {str(e)}\")\n\n# Startup event\n@app.on_event(\"startup\")\nasync def startup_event():\n    # Initialize services\n    await provider_service.initialize()\n    await knowledge_service.initialize()\n    await memory_service.initialize()\n    await task_service.initialize()\n    \n    logger.info(\"All services initialized\")\n\n# Shutdown event\n@app.on_event(\"shutdown\")\nasync def shutdown_event():\n    # Cleanup\n    await provider_service.cleanup()\n    await knowledge_service.cleanup()\n    await memory_service.cleanup()\n    await task_service.cleanup()\n    \n    logger.info(\"All services shut down\")\n\nif __name__ == \"__main__\":\n    import uvicorn\n    uvicorn.run(app, host=\"0.0.0.0\", port=8000)\n"])</script><script>self.__next_f.push([1,"87:[\"$\",\"pre\",\"pre-29\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2dc\"}]}]\n88:[\"$\",\"h2\",\"h2-29\",{\"id\":\"conclusion-2\",\"children\":\"Conclusion\"}]\n89:[\"$\",\"p\",\"p-26\",{\"children\":\"This comprehensive implementation demonstrates the integration of OpenAI's Responses API within a sophisticated agent architecture. The modular design allows for specialized cognitive capabilities including knowledge retrieval, conversation management, contextual awareness, and task coordination.\"}]\n8a:[\"$\",\"p\",\"p-27\",{\"children\":\"Key architectural features include:\"}]\n8b:[\"$\",\"ol\",\"ol-5\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Abstraction Layers\"}],\": The system maintains clean separation between provider services, agent logic, and specialized capabilities.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Contextual Enhancement\"}],\": Agents utilize memory systems and knowledge retrieval to maintain context and provide more relevant responses.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Tool Integration\"}],\": The implementation leverages OpenAI's function calling capabilities to integrate with external systems and services.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Meta-Agent Architecture\"}],\": The meta-agent pattern enables composition of specialized agents into a coherent system that routes queries optimally.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Stateful Conversations\"}],\": All agents maintain conversation state, allowing for continuity and context preservation across interactions.\"]}],\"\\n\"]}],\"\\n\"]}]\n8c:[\"$\",\"p\",\"p-28\",{\"children\":\"This architecture provides a foundation for building sophisticated AI applications that leverage both OpenAI's cloud capabilities and local Ollama models through the MCP system's intelligent routing.\"}]\n8d:[\"$\",\"h1\",\"h1-4\",{\"id\":\"hybrid-intelligence-architecture-integrating-ollama-with-openais-agent-sdk\",\"children\":\"Hybrid Intelligence Architecture: Integrating Ollama with OpenAI's Agent SDK\"}]\n8e:[\"$\",\"h2\",\"h2-30\",{\"id\":\"theoretical-framework-for-hybrid-model-inference\",\"children\":\"Theoretical Framework for Hybrid Model Inference\"}]\n8f:[\"$\",\"p\",\"p-29\",{\"children\":\"The integration of Ollama with OpenAI's Agent SDK represents a significant advancement in hybrid AI architectures. This document articulates the methodological approach for implementing a sophisticated orchestration layer that intelligently routes inference tasks between cloud-based and local computational resources based on contextual parameters.\"}]\n90:[\"$\",\"h2\",\"h2-31\",{\"id\":\"ollama-integration-architecture\",\"children\":\"Ollama Integration Architecture\"}]\n91:[\"$\",\"h3\",\"h3-23\",{\"id\":\"core-integration-components\",\"children\":\"Core Integration Components\"}]\n2dd:T40d2,"])</script><script>self.__next_f.push([1,"# app/services/ollama_service.py\nimport os\nimport json\nimport logging\nfrom typing import List, Dict, Any, Optional, Union\nimport aiohttp\nimport asyncio\nfrom tenacity import retry, stop_after_attempt, wait_exponential\n\nfrom app.models.message import Message, MessageRole\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nclass OllamaService:\n    \"\"\"Service for interacting with Ollama's local inference capabilities.\"\"\"\n    \n    def __init__(self):\n        self.base_url = settings.OLLAMA_HOST\n        self.default_model = settings.OLLAMA_MODEL\n        self.timeout = aiohttp.ClientTimeout(total=settings.REQUEST_TIMEOUT)\n        self.session = None\n        \n        # Capability mapping for different models\n        self.model_capabilities = {\n            \"llama2\": {\n                \"supports_tools\": False,\n                \"context_window\": 4096,\n                \"strengths\": [\"general_knowledge\", \"reasoning\"],\n                \"max_tokens\": 2048\n            },\n            \"codellama\": {\n                \"supports_tools\": False,\n                \"context_window\": 8192,\n                \"strengths\": [\"code_generation\", \"technical_explanation\"],\n                \"max_tokens\": 2048\n            },\n            \"mistral\": {\n                \"supports_tools\": False,\n                \"context_window\": 8192,\n                \"strengths\": [\"instruction_following\", \"reasoning\"],\n                \"max_tokens\": 2048\n            },\n            \"dolphin-mistral\": {\n                \"supports_tools\": False,\n                \"context_window\": 8192,\n                \"strengths\": [\"conversational\", \"creative_writing\"],\n                \"max_tokens\": 2048\n            }\n        }\n    \n    async def initialize(self):\n        \"\"\"Initialize the Ollama service.\"\"\"\n        self.session = aiohttp.ClientSession(timeout=self.timeout)\n        \n        # Verify connectivity\n        try:\n            await self.list_models()\n            logger.info(\"Ollama service initialized successfully\")\n        except Exception as e:\n            logger.error(f\"Failed to initialize Ollama service: {str(e)}\")\n            raise\n    \n    async def cleanup(self):\n        \"\"\"Clean up resources.\"\"\"\n        if self.session:\n            await self.session.close()\n            self.session = None\n    \n    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))\n    async def list_models(self) -\u003e List[Dict[str, Any]]:\n        \"\"\"List available models in Ollama.\"\"\"\n        if not self.session:\n            self.session = aiohttp.ClientSession(timeout=self.timeout)\n            \n        async with self.session.get(f\"{self.base_url}/api/tags\") as response:\n            if response.status != 200:\n                error_text = await response.text()\n                raise Exception(f\"Failed to list models: {error_text}\")\n            \n            data = await response.json()\n            return data.get(\"models\", [])\n    \n    async def generate_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: Optional[str] = None,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        stream: bool = False,\n        **kwargs\n    ) -\u003e Dict[str, Any]:\n        \"\"\"Generate a completion using Ollama.\"\"\"\n        model_name = model or self.default_model\n        \n        # Check if specified model is available\n        try:\n            available_models = await self.list_models()\n            model_names = [m.get(\"name\") for m in available_models]\n            \n            if model_name not in model_names:\n                fallback_model = self.default_model\n                logger.warning(\n                    f\"Model '{model_name}' not available in Ollama. \"\n                    f\"Using fallback model '{fallback_model}'.\"\n                )\n                model_name = fallback_model\n        except Exception as e:\n            logger.error(f\"Error checking model availability: {str(e)}\")\n            model_name = self.default_model\n        \n        # Get model capabilities\n        model_base_name = model_name.split(':')[0] if ':' in model_name else model_name\n        capabilities = self.model_capabilities.get(\n            model_base_name, \n            {\"supports_tools\": False, \"context_window\": 4096, \"max_tokens\": 2048}\n        )\n        \n        # Check if tools are requested but not supported\n        if tools and not capabilities[\"supports_tools\"]:\n            logger.warning(\n                f\"Model '{model_name}' does not support tools. \"\n                \"Tool functionality will be simulated with prompt engineering.\"\n            )\n            # We'll handle this by incorporating tool descriptions into the prompt\n        \n        # Format messages for Ollama\n        prompt = self._format_messages_for_ollama(messages, tools)\n        \n        # Set max_tokens based on capabilities if not provided\n        if max_tokens is None:\n            max_tokens = capabilities[\"max_tokens\"]\n        else:\n            max_tokens = min(max_tokens, capabilities[\"max_tokens\"])\n        \n        # Prepare request payload\n        payload = {\n            \"model\": model_name,\n            \"prompt\": prompt,\n            \"stream\": stream,\n            \"options\": {\n                \"temperature\": temperature,\n                \"num_predict\": max_tokens\n            }\n        }\n        \n        if stream:\n            return await self._stream_completion(payload)\n        else:\n            return await self._generate_completion_sync(payload)\n    \n    async def _generate_completion_sync(self, payload: Dict[str, Any]) -\u003e Dict[str, Any]:\n        \"\"\"Generate a completion synchronously.\"\"\"\n        if not self.session:\n            self.session = aiohttp.ClientSession(timeout=self.timeout)\n            \n        try:\n            async with self.session.post(\n                f\"{self.base_url}/api/generate\", \n                json=payload\n            ) as response:\n                if response.status != 200:\n                    error_text = await response.text()\n                    raise Exception(f\"Ollama generate error: {error_text}\")\n                \n                result = await response.json()\n                \n                # Format the response to match OpenAI's format for consistency\n                formatted_response = self._format_ollama_response(result, payload)\n                return formatted_response\n                \n        except Exception as e:\n            logger.error(f\"Error generating completion: {str(e)}\")\n            raise\n    \n    async def _stream_completion(self, payload: Dict[str, Any]):\n        \"\"\"Stream a completion.\"\"\"\n        if not self.session:\n            self.session = aiohttp.ClientSession(timeout=self.timeout)\n            \n        try:\n            async with self.session.post(\n                f\"{self.base_url}/api/generate\", \n                json=payload, \n                timeout=aiohttp.ClientTimeout(total=60)\n            ) as response:\n                if response.status != 200:\n                    error_text = await response.text()\n                    raise Exception(f\"Ollama generate error: {error_text}\")\n                \n                # Stream the response\n                full_text = \"\"\n                async for line in response.content:\n                    if not line:\n                        continue\n                    \n                    try:\n                        chunk = json.loads(line)\n                        text_chunk = chunk.get(\"response\", \"\")\n                        full_text += text_chunk\n                        \n                        # Yield formatted chunk for streaming\n                        yield self._format_ollama_stream_chunk(text_chunk)\n                        \n                        # Check if done\n                        if chunk.get(\"done\", False):\n                            break\n                    except json.JSONDecodeError:\n                        logger.warning(f\"Invalid JSON in stream: {line}\")\n                \n                # Send the final done chunk\n                yield self._format_ollama_stream_chunk(\"\", done=True, full_text=full_text)\n                \n        except Exception as e:\n            logger.error(f\"Error streaming completion: {str(e)}\")\n            raise\n    \n    def _format_messages_for_ollama(\n        self, \n        messages: List[Dict[str, str]],\n        tools: Optional[List[Dict[str, Any]]] = None\n    ) -\u003e str:\n        \"\"\"Format messages for Ollama.\"\"\"\n        formatted_messages = []\n        \n        # Add tools descriptions if provided\n        if tools:\n            tools_description = self._format_tools_description(tools)\n            formatted_messages.append(f\"[System]\\n{tools_description}\\n\")\n        \n        for msg in messages:\n            role = msg[\"role\"]\n            content = msg[\"content\"] or \"\"\n            \n            if role == \"system\":\n                formatted_messages.append(f\"[System]\\n{content}\")\n            elif role == \"user\":\n                formatted_messages.append(f\"[User]\\n{content}\")\n            elif role == \"assistant\":\n                formatted_messages.append(f\"[Assistant]\\n{content}\")\n            elif role == \"tool\":\n                # Format tool responses\n                tool_call_id = msg.get(\"tool_call_id\", \"unknown\")\n                formatted_messages.append(f\"[Tool Result: {tool_call_id}]\\n{content}\")\n        \n        # Add final prompt for assistant response\n        formatted_messages.append(\"[Assistant]\\n\")\n        \n        return \"\\n\\n\".join(formatted_messages)\n    \n    def _format_tools_description(self, tools: List[Dict[str, Any]]) -\u003e str:\n        \"\"\"Format tools description for inclusion in the prompt.\"\"\"\n        tools_text = [\"You have access to the following tools:\"]\n        \n        for tool in tools:\n            if tool.get(\"type\") == \"function\":\n                function = tool[\"function\"]\n                function_name = function[\"name\"]\n                function_description = function.get(\"description\", \"\")\n                \n                tools_text.append(f\"Tool: {function_name}\")\n                tools_text.append(f\"Description: {function_description}\")\n                \n                # Format parameters if available\n                if \"parameters\" in function:\n                    parameters = function[\"parameters\"]\n                    if \"properties\" in parameters:\n                        tools_text.append(\"Parameters:\")\n                        for param_name, param_details in parameters[\"properties\"].items():\n                            param_type = param_details.get(\"type\", \"unknown\")\n                            param_desc = param_details.get(\"description\", \"\")\n                            required = \"Required\" if param_name in parameters.get(\"required\", []) else \"Optional\"\n                            tools_text.append(f\"  - {param_name} ({param_type}, {required}): {param_desc}\")\n                \n                tools_text.append(\"\")  # Empty line between tools\n        \n        tools_text.append(\"\"\"\nWhen you need to use a tool, specify it clearly using the format:\n\n\u003ctool\u003e\n{\n  \"name\": \"tool_name\",\n  \"parameters\": {\n    \"param1\": \"value1\",\n    \"param2\": \"value2\"\n  }\n}\n\u003c/tool\u003e\n\nWait for the tool result before continuing.\n\"\"\")\n        \n        return \"\\n\".join(tools_text)\n    \n    def _format_ollama_response(self, result: Dict[str, Any], request: Dict[str, Any]) -\u003e Dict[str, Any]:\n        \"\"\"Format Ollama response to match OpenAI's format.\"\"\"\n        response_text = result.get(\"response\", \"\")\n        \n        # Check for tool calls in the response\n        tool_calls = self._extract_tool_calls(response_text)\n        \n        # Calculate token counts (approximate)\n        prompt_tokens = len(request[\"prompt\"]) // 4  # Rough approximation\n        completion_tokens = len(response_text) // 4  # Rough approximation\n        \n        response = {\n            \"id\": f\"ollama-{result.get('id', 'unknown')}\",\n            \"object\": \"chat.completion\",\n            \"created\": int(result.get(\"created_at\", 0)),\n            \"model\": request[\"model\"],\n            \"provider\": \"ollama\",\n            \"usage\": {\n                \"prompt_tokens\": prompt_tokens,\n                \"completion_tokens\": completion_tokens,\n                \"total_tokens\": prompt_tokens + completion_tokens\n            },\n            \"message\": {\n                \"role\": \"assistant\",\n                \"content\": self._clean_tool_calls_from_text(response_text) if tool_calls else response_text,\n                \"tool_calls\": tool_calls\n            }\n        }\n        \n        return response\n    \n    def _format_ollama_stream_chunk(\n        self, \n        chunk_text: str, \n        done: bool = False,\n        full_text: Optional[str] = None\n    ) -\u003e Dict[str, Any]:\n        \"\"\"Format a streaming chunk to match OpenAI's format.\"\"\"\n        if done and full_text:\n            # Final chunk might include tool calls\n            tool_calls = self._extract_tool_calls(full_text)\n            cleaned_text = self._clean_tool_calls_from_text(full_text) if tool_calls else full_text\n            \n            return {\n                \"id\": f\"ollama-chunk-{id(chunk_text)}\",\n                \"object\": \"chat.completion.chunk\",\n                \"created\": int(time.time()),\n                \"model\": self.default_model,\n                \"choices\": [{\n                    \"index\": 0,\n                    \"delta\": {\n                        \"content\": \"\",\n                        \"tool_calls\": tool_calls if tool_calls else None\n                    },\n                    \"finish_reason\": \"stop\"\n                }]\n            }\n        else:\n            return {\n                \"id\": f\"ollama-chunk-{id(chunk_text)}\",\n                \"object\": \"chat.completion.chunk\",\n                \"created\": int(time.time()),\n                \"model\": self.default_model,\n                \"choices\": [{\n                    \"index\": 0,\n                    \"delta\": {\n                        \"content\": chunk_text\n                    },\n                    \"finish_reason\": None\n                }]\n            }\n    \n    def _extract_tool_calls(self, text: str) -\u003e Optional[List[Dict[str, Any]]]:\n        \"\"\"Extract tool calls from response text.\"\"\"\n        import re\n        import uuid\n        \n        # Look for tool calls in the format \u003ctool\u003e...\u003c/tool\u003e\n        tool_pattern = re.compile(r'\u003ctool\u003e(.*?)\u003c/tool\u003e', re.DOTALL)\n        matches = tool_pattern.findall(text)\n        \n        if not matches:\n            return None\n        \n        tool_calls = []\n        for i, match in enumerate(matches):\n            try:\n                # Try to parse as JSON\n                tool_data = json.loads(match.strip())\n                \n                tool_calls.append({\n                    \"id\": f\"call_{uuid.uuid4().hex[:8]}\",\n                    \"type\": \"function\",\n                    \"function\": {\n                        \"name\": tool_data.get(\"name\", \"unknown_tool\"),\n                        \"arguments\": json.dumps(tool_data.get(\"parameters\", {}))\n                    }\n                })\n            except json.JSONDecodeError:\n                # If not valid JSON, try to extract name and arguments using regex\n                name_match = re.search(r'\"name\"\\s*:\\s*\"([^\"]+)\"', match)\n                args_match = re.search(r'\"parameters\"\\s*:\\s*(\\{.*\\})', match)\n                \n                if name_match:\n                    tool_name = name_match.group(1)\n                    tool_args = \"{}\" if not args_match else args_match.group(1)\n                    \n                    tool_calls.append({\n                        \"id\": f\"call_{uuid.uuid4().hex[:8]}\",\n                        \"type\": \"function\",\n                        \"function\": {\n                            \"name\": tool_name,\n                            \"arguments\": tool_args\n                        }\n                    })\n        \n        return tool_calls if tool_calls else None\n    \n    def _clean_tool_calls_from_text(self, text: str) -\u003e str:\n        \"\"\"Remove tool calls from response text.\"\"\"\n        import re\n        \n        # Remove \u003ctool\u003e...\u003c/tool\u003e blocks\n        cleaned_text = re.sub(r'\u003ctool\u003e.*?\u003c/tool\u003e', '', text, flags=re.DOTALL)\n        \n        # Remove any leftover tool usage instructions\n        cleaned_text = re.sub(r'I will use a tool to help with this\\.', '', cleaned_text)\n        cleaned_text = re.sub(r'Let me use the .* tool\\.', '', cleaned_text)\n        \n        # Clean up multiple newlines\n        cleaned_text = re.sub(r'\\n{3,}', '\\n\\n', cleaned_text)\n        \n        return cleaned_text.strip()\n"])</script><script>self.__next_f.push([1,"92:[\"$\",\"pre\",\"pre-30\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2dd\"}]}]\n93:[\"$\",\"h3\",\"h3-24\",{\"id\":\"provider-selection-service\",\"children\":\"Provider Selection Service\"}]\n2de:T526e,"])</script><script>self.__next_f.push([1,"# app/services/provider_service.py\nimport os\nimport json\nimport logging\nimport time\nfrom typing import List, Dict, Any, Optional, Union, AsyncGenerator\nimport asyncio\nfrom enum import Enum\nimport hashlib\n\nimport openai\nfrom openai import AsyncOpenAI\nfrom app.services.ollama_service import OllamaService\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nclass Provider(str, Enum):\n    OPENAI = \"openai\"\n    OLLAMA = \"ollama\"\n    AUTO = \"auto\"\n\nclass ModelSelectionCriteria:\n    \"\"\"Criteria for model selection in auto-routing.\"\"\"\n    def __init__(\n        self,\n        complexity_threshold: float = 0.65,\n        privacy_sensitive_tokens: List[str] = None,\n        latency_requirement: Optional[float] = None,\n        token_budget: Optional[int] = None,\n        tool_requirements: Optional[List[str]] = None\n    ):\n        self.complexity_threshold = complexity_threshold\n        self.privacy_sensitive_tokens = privacy_sensitive_tokens or []\n        self.latency_requirement = latency_requirement\n        self.token_budget = token_budget\n        self.tool_requirements = tool_requirements\n\nclass ProviderService:\n    \"\"\"Service for routing requests to the appropriate provider.\"\"\"\n    \n    def __init__(self):\n        self.openai_client = None\n        self.ollama_service = OllamaService()\n        self.model_selection_criteria = ModelSelectionCriteria(\n            complexity_threshold=settings.COMPLEXITY_THRESHOLD,\n            privacy_sensitive_tokens=settings.PRIVACY_SENSITIVE_TOKENS.split(\",\") if hasattr(settings, \"PRIVACY_SENSITIVE_TOKENS\") else []\n        )\n        \n        # Model mappings\n        self.default_openai_model = settings.OPENAI_MODEL\n        self.default_ollama_model = settings.OLLAMA_MODEL\n        \n        # Response cache\n        self.cache_enabled = getattr(settings, \"ENABLE_RESPONSE_CACHE\", False)\n        self.cache = {}\n        self.cache_ttl = getattr(settings, \"RESPONSE_CACHE_TTL\", 3600)  # 1 hour default\n    \n    async def initialize(self):\n        \"\"\"Initialize the provider service.\"\"\"\n        # Initialize OpenAI client\n        self.openai_client = AsyncOpenAI(\n            api_key=settings.OPENAI_API_KEY,\n            organization=getattr(settings, \"OPENAI_ORG_ID\", None)\n        )\n        \n        # Initialize Ollama service\n        await self.ollama_service.initialize()\n        \n        logger.info(\"Provider service initialized\")\n    \n    async def cleanup(self):\n        \"\"\"Clean up resources.\"\"\"\n        await self.ollama_service.cleanup()\n    \n    async def generate_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: Optional[str] = None,\n        provider: Optional[Union[str, Provider]] = None,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        stream: bool = False,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        user: Optional[str] = None,\n        **kwargs\n    ) -\u003e Dict[str, Any]:\n        \"\"\"Generate a completion from the selected provider.\"\"\"\n        # Determine the provider and model\n        selected_provider, selected_model = await self._select_provider_and_model(\n            messages, model, provider, tools, **kwargs\n        )\n        \n        # Check cache if enabled and not streaming\n        if self.cache_enabled and not stream:\n            cache_key = self._generate_cache_key(\n                messages, selected_provider, selected_model, tools, temperature, max_tokens, kwargs\n            )\n            cached_response = self._get_from_cache(cache_key)\n            if cached_response:\n                logger.info(f\"Cache hit for {selected_provider}:{selected_model}\")\n                return cached_response\n        \n        # Generate completion based on selected provider\n        try:\n            if selected_provider == Provider.OPENAI:\n                response = await self._generate_openai_completion(\n                    messages, selected_model, tools, stream, temperature, max_tokens, user, **kwargs\n                )\n            else:  # OLLAMA\n                response = await self._generate_ollama_completion(\n                    messages, selected_model, tools, stream, temperature, max_tokens, **kwargs\n                )\n            \n            # Add provider info and cache if appropriate\n            if not stream and response:\n                response[\"provider\"] = selected_provider.value\n                if self.cache_enabled:\n                    self._add_to_cache(cache_key, response)\n            \n            return response\n        except Exception as e:\n            logger.error(f\"Error generating completion with {selected_provider}: {str(e)}\")\n            \n            # Try fallback if auto-routing was enabled\n            if provider == Provider.AUTO:\n                fallback_provider = Provider.OLLAMA if selected_provider == Provider.OPENAI else Provider.OPENAI\n                logger.info(f\"Attempting fallback to {fallback_provider}\")\n                \n                try:\n                    if fallback_provider == Provider.OPENAI:\n                        fallback_model = self.default_openai_model\n                        response = await self._generate_openai_completion(\n                            messages, fallback_model, tools, stream, temperature, max_tokens, user, **kwargs\n                        )\n                    else:  # OLLAMA\n                        fallback_model = self.default_ollama_model\n                        response = await self._generate_ollama_completion(\n                            messages, fallback_model, tools, stream, temperature, max_tokens, **kwargs\n                        )\n                    \n                    if not stream and response:\n                        response[\"provider\"] = fallback_provider.value\n                        # Don't cache fallback responses\n                    \n                    return response\n                except Exception as fallback_error:\n                    logger.error(f\"Fallback also failed: {str(fallback_error)}\")\n            \n            # Re-raise the original error if we couldn't fall back\n            raise\n    \n    async def stream_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: Optional[str] = None,\n        provider: Optional[Union[str, Provider]] = None,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        user: Optional[str] = None,\n        **kwargs\n    ) -\u003e AsyncGenerator[Dict[str, Any], None]:\n        \"\"\"Stream a completion from the selected provider.\"\"\"\n        # Always stream with this method\n        kwargs[\"stream\"] = True\n        \n        # Determine the provider and model\n        selected_provider, selected_model = await self._select_provider_and_model(\n            messages, model, provider, tools, **kwargs\n        )\n        \n        try:\n            if selected_provider == Provider.OPENAI:\n                async for chunk in self._stream_openai_completion(\n                    messages, selected_model, tools, temperature, max_tokens, user, **kwargs\n                ):\n                    chunk[\"provider\"] = selected_provider.value\n                    yield chunk\n            else:  # OLLAMA\n                async for chunk in self._stream_ollama_completion(\n                    messages, selected_model, tools, temperature, max_tokens, **kwargs\n                ):\n                    chunk[\"provider\"] = selected_provider.value\n                    yield chunk\n        except Exception as e:\n            logger.error(f\"Error streaming completion with {selected_provider}: {str(e)}\")\n            \n            # Try fallback if auto-routing was enabled\n            if provider == Provider.AUTO:\n                fallback_provider = Provider.OLLAMA if selected_provider == Provider.OPENAI else Provider.OPENAI\n                logger.info(f\"Attempting fallback to {fallback_provider}\")\n                \n                try:\n                    if fallback_provider == Provider.OPENAI:\n                        fallback_model = self.default_openai_model\n                        async for chunk in self._stream_openai_completion(\n                            messages, fallback_model, tools, temperature, max_tokens, user, **kwargs\n                        ):\n                            chunk[\"provider\"] = fallback_provider.value\n                            yield chunk\n                    else:  # OLLAMA\n                        fallback_model = self.default_ollama_model\n                        async for chunk in self._stream_ollama_completion(\n                            messages, fallback_model, tools, temperature, max_tokens, **kwargs\n                        ):\n                            chunk[\"provider\"] = fallback_provider.value\n                            yield chunk\n                except Exception as fallback_error:\n                    logger.error(f\"Fallback streaming also failed: {str(fallback_error)}\")\n                    # Nothing more we can do here\n            \n            # For streaming, we don't re-raise since we've already started the response\n    \n    async def _select_provider_and_model(\n        self,\n        messages: List[Dict[str, str]],\n        model: Optional[str] = None,\n        provider: Optional[Union[str, Provider]] = None,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        **kwargs\n    ) -\u003e tuple[Provider, str]:\n        \"\"\"Select the provider and model based on input and criteria.\"\"\"\n        # Handle explicit provider/model specification\n        if model and \":\" in model:\n            # Format: \"provider:model\", e.g. \"openai:gpt-4\" or \"ollama:llama2\"\n            provider_str, model_name = model.split(\":\", 1)\n            selected_provider = Provider(provider_str.lower())\n            return selected_provider, model_name\n        \n        # Handle explicit provider with default model\n        if provider and provider != Provider.AUTO:\n            selected_provider = Provider(provider) if isinstance(provider, str) else provider\n            selected_model = model or (\n                self.default_openai_model if selected_provider == Provider.OPENAI \n                else self.default_ollama_model\n            )\n            return selected_provider, selected_model\n        \n        # If model specified without provider, infer provider\n        if model:\n            # Heuristic: OpenAI models typically start with \"gpt-\" or \"text-\"\n            if model.startswith((\"gpt-\", \"text-\")):\n                return Provider.OPENAI, model\n            else:\n                return Provider.OLLAMA, model\n        \n        # Auto-routing based on message content and requirements\n        if not provider or provider == Provider.AUTO:\n            selected_provider = await self._auto_route(messages, tools, **kwargs)\n            selected_model = (\n                self.default_openai_model if selected_provider == Provider.OPENAI \n                else self.default_ollama_model\n            )\n            return selected_provider, selected_model\n        \n        # Default fallback\n        return Provider.OPENAI, self.default_openai_model\n    \n    async def _auto_route(\n        self,\n        messages: List[Dict[str, str]],\n        tools: Optional[List[Dict[str, Any]]] = None,\n        **kwargs\n    ) -\u003e Provider:\n        \"\"\"Automatically route to the appropriate provider based on content and requirements.\"\"\"\n        # 1. Check for tool requirements\n        if tools:\n            # If tools are required, prefer OpenAI as Ollama's tool support is limited\n            return Provider.OPENAI\n        \n        # 2. Check for privacy concerns\n        if self._contains_sensitive_information(messages):\n            logger.info(\"Privacy sensitive information detected, routing to Ollama\")\n            return Provider.OLLAMA\n        \n        # 3. Assess complexity\n        complexity_score = await self._assess_complexity(messages)\n        logger.info(f\"Content complexity score: {complexity_score}\")\n        \n        if complexity_score \u003e self.model_selection_criteria.complexity_threshold:\n            logger.info(f\"High complexity content ({complexity_score}), routing to OpenAI\")\n            return Provider.OPENAI\n        \n        # 4. Consider token budget (if specified)\n        token_budget = kwargs.get(\"token_budget\") or self.model_selection_criteria.token_budget\n        if token_budget:\n            estimated_tokens = self._estimate_token_count(messages)\n            if estimated_tokens \u003e token_budget:\n                logger.info(f\"Token budget ({token_budget}) exceeded ({estimated_tokens}), routing to OpenAI\")\n                return Provider.OPENAI\n        \n        # Default to Ollama for standard requests\n        logger.info(\"Standard request, routing to Ollama\")\n        return Provider.OLLAMA\n    \n    def _contains_sensitive_information(self, messages: List[Dict[str, str]]) -\u003e bool:\n        \"\"\"Check if messages contain privacy-sensitive information.\"\"\"\n        sensitive_tokens = self.model_selection_criteria.privacy_sensitive_tokens\n        if not sensitive_tokens:\n            return False\n        \n        combined_text = \" \".join([msg.get(\"content\", \"\") or \"\" for msg in messages])\n        combined_text = combined_text.lower()\n        \n        for token in sensitive_tokens:\n            if token.lower() in combined_text:\n                return True\n        \n        return False\n    \n    async def _assess_complexity(self, messages: List[Dict[str, str]]) -\u003e float:\n        \"\"\"Assess the complexity of the messages.\"\"\"\n        # Simple heuristics for complexity:\n        # 1. Length of content\n        # 2. Presence of complex tokens (technical terms, specialized vocabulary)\n        # 3. Sentence complexity\n        \n        user_messages = [msg.get(\"content\", \"\") for msg in messages if msg.get(\"role\") == \"user\"]\n        if not user_messages:\n            return 0.0\n        \n        last_message = user_messages[-1] or \"\"\n        \n        # 1. Length factor (normalized to 0-1 range)\n        length = len(last_message)\n        length_factor = min(length / 1000, 1.0) * 0.3  # 30% weight to length\n        \n        # 2. Complexity indicators\n        complex_terms = [\n            \"analyze\", \"synthesize\", \"evaluate\", \"compare\", \"contrast\",\n            \"explain\", \"technical\", \"detailed\", \"comprehensive\", \"algorithm\",\n            \"implementation\", \"architecture\", \"design\", \"optimize\", \"complex\"\n        ]\n        \n        term_count = sum(1 for term in complex_terms if term in last_message.lower())\n        term_factor = min(term_count / 10, 1.0) * 0.4  # 40% weight to complex terms\n        \n        # 3. Sentence complexity (approximated by average sentence length)\n        sentences = [s.strip() for s in last_message.split(\".\") if s.strip()]\n        if sentences:\n            avg_sentence_length = sum(len(s.split()) for s in sentences) / len(sentences)\n            sentence_factor = min(avg_sentence_length / 25, 1.0) * 0.3  # 30% weight to sentence complexity\n        else:\n            sentence_factor = 0.0\n        \n        # Combined complexity score\n        complexity = length_factor + term_factor + sentence_factor\n        \n        return complexity\n    \n    def _estimate_token_count(self, messages: List[Dict[str, str]]) -\u003e int:\n        \"\"\"Estimate the token count for the messages.\"\"\"\n        # Simple approximation: 1 token ≈ 4 characters\n        combined_text = \" \".join([msg.get(\"content\", \"\") or \"\" for msg in messages])\n        return len(combined_text) // 4\n    \n    async def _generate_openai_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: str,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        stream: bool = False,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        user: Optional[str] = None,\n        **kwargs\n    ) -\u003e Dict[str, Any]:\n        \"\"\"Generate a completion using OpenAI.\"\"\"\n        completion_kwargs = {\n            \"model\": model,\n            \"messages\": messages,\n            \"temperature\": temperature,\n            \"stream\": stream\n        }\n        \n        if max_tokens:\n            completion_kwargs[\"max_tokens\"] = max_tokens\n        \n        if tools:\n            completion_kwargs[\"tools\"] = tools\n        \n        if \"tool_choice\" in kwargs:\n            completion_kwargs[\"tool_choice\"] = kwargs[\"tool_choice\"]\n        \n        if \"response_format\" in kwargs:\n            completion_kwargs[\"response_format\"] = kwargs[\"response_format\"]\n        \n        if user:\n            completion_kwargs[\"user\"] = user\n        \n        if stream:\n            response_stream = await self.openai_client.chat.completions.create(**completion_kwargs)\n            \n            full_response = None\n            async for chunk in response_stream:\n                if not full_response:\n                    full_response = chunk\n                yield chunk.model_dump()\n        else:\n            response = await self.openai_client.chat.completions.create(**completion_kwargs)\n            return response.model_dump()\n    \n    async def _stream_openai_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: str,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        user: Optional[str] = None,\n        **kwargs\n    ) -\u003e AsyncGenerator[Dict[str, Any], None]:\n        \"\"\"Stream a completion from OpenAI.\"\"\"\n        # This is just a wrapper around _generate_openai_completion with stream=True\n        async for chunk in self._generate_openai_completion(\n            messages, model, tools, True, temperature, max_tokens, user, **kwargs\n        ):\n            yield chunk\n    \n    async def _generate_ollama_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: str,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        stream: bool = False,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        **kwargs\n    ) -\u003e Dict[str, Any]:\n        \"\"\"Generate a completion using Ollama.\"\"\"\n        if stream:\n            # For streaming, return the first chunk to maintain API consistency\n            async for chunk in self.ollama_service.generate_completion(\n                messages=messages,\n                model=model,\n                temperature=temperature,\n                max_tokens=max_tokens,\n                tools=tools,\n                stream=True,\n                **kwargs\n            ):\n                return chunk\n        else:\n            return await self.ollama_service.generate_completion(\n                messages=messages,\n                model=model,\n                temperature=temperature,\n                max_tokens=max_tokens,\n                tools=tools,\n                stream=False,\n                **kwargs\n            )\n    \n    async def _stream_ollama_completion(\n        self,\n        messages: List[Dict[str, str]],\n        model: str,\n        tools: Optional[List[Dict[str, Any]]] = None,\n        temperature: float = 0.7,\n        max_tokens: Optional[int] = None,\n        **kwargs\n    ) -\u003e AsyncGenerator[Dict[str, Any], None]:\n        \"\"\"Stream a completion from Ollama.\"\"\"\n        async for chunk in self.ollama_service.generate_completion(\n            messages=messages,\n            model=model,\n            temperature=temperature,\n            max_tokens=max_tokens,\n            tools=tools,\n            stream=True,\n            **kwargs\n        ):\n            yield chunk\n    \n    def _generate_cache_key(self, *args) -\u003e str:\n        \"\"\"Generate a cache key based on the input parameters.\"\"\"\n        # Convert complex objects to JSON strings first\n        args_str = json.dumps([arg if not isinstance(arg, (dict, list)) else json.dumps(arg, sort_keys=True) for arg in args])\n        return hashlib.md5(args_str.encode()).hexdigest()\n    \n    def _get_from_cache(self, key: str) -\u003e Optional[Dict[str, Any]]:\n        \"\"\"Get a response from cache if available and not expired.\"\"\"\n        if key not in self.cache:\n            return None\n            \n        cached_item = self.cache[key]\n        if time.time() - cached_item[\"timestamp\"] \u003e self.cache_ttl:\n            # Expired\n            del self.cache[key]\n            return None\n            \n        return cached_item[\"response\"]\n    \n    def _add_to_cache(self, key: str, response: Dict[str, Any]):\n        \"\"\"Add a response to the cache.\"\"\"\n        self.cache[key] = {\n            \"response\": response,\n            \"timestamp\": time.time()\n        }\n        \n        # Simple cache size management - remove oldest if too many items\n        max_cache_size = getattr(settings, \"RESPONSE_CACHE_MAX_ITEMS\", 1000)\n        if len(self.cache) \u003e max_cache_size:\n            # Remove oldest 10% of items\n            items_to_remove = max(1, int(max_cache_size * 0.1))\n            oldest_keys = sorted(\n                self.cache.keys(), \n                key=lambda k: self.cache[k][\"timestamp\"]\n            )[:items_to_remove]\n            \n            for old_key in oldest_keys:\n                del self.cache[old_key]\n"])</script><script>self.__next_f.push([1,"94:[\"$\",\"pre\",\"pre-31\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2de\"}]}]\n95:[\"$\",\"h2\",\"h2-32\",{\"id\":\"configuration-settings\",\"children\":\"Configuration Settings\"}]\n2df:T54d,# app/config.py\nimport os\nfrom pydantic_settings import BaseSettings\nfrom typing import List, Optional, Dict, Any\nfrom dotenv import load_dotenv\n\n# Load environment variables from .env file\nload_dotenv()\n\nclass Settings(BaseSettings):\n    # API Keys and Authentication\n    OPENAI_API_KEY: str\n    OPENAI_ORG_ID: Optional[str] = None\n    \n    # Model Configuration\n    OPENAI_MODEL: str = \"gpt-4o\"\n    OLLAMA_MODEL: str = \"llama2\"\n    OLLAMA_HOST: str = \"http://localhost:11434\"\n    \n    # System Behavior\n    TEMPERATURE: float = 0.7\n    MAX_TOKENS: int = 4096\n    REQUEST_TIMEOUT: int = 120\n    \n    # Routing Configuration\n    COMPLEXITY_THRESHOLD: float = 0.65\n    PRIVACY_SENSITIVE_TOKENS: str = \"password,secret,token,key,credential\"\n    \n    # Caching Configuration\n    ENABLE_RESPONSE_CACHE: bool = True\n    RESPONSE_CACHE_TTL: int = 3600  # 1 hour\n    RESPONSE_CACHE_MAX_ITEMS: int = 1000\n    \n    # Logging Configuration\n    LOG_LEVEL: str = \"INFO\"\n    \n    # Database Configuration\n    DATABASE_URL: Optional[str] = None\n    \n    # Advanced Ollama Configuration\n    OLLAMA_MODELS_MAPPING: Dict[str, str] = {\n        \"gpt-3.5-turbo\": \"llama2\",\n        \"gpt-4\": \"llama2\",\n        \"gpt-4o\": \"mistral\",\n        \"code-llama\": \"codellama\"\n    }\n    \n    class Config:\n        env_file = \".env\"\n        env_file_encoding = \"utf-8\"\n\nsettings = Settings()\n96:[\"$\",\"pre\",\"pre-32\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2df\"}]}]\n97:[\"$\",\"h2\",\"h2-33\",{\"id\":\"model-selection-and-configuration\",\"children\":\"Model Selection and Configuration\"}]\n98:[\"$\",\"p\",\"p-30\",{\"children\":\"Below is a table of recommended Ollama models and their optimal use cases:\"}]\n2e0:T1562,"])</script><script>self.__next_f.push([1,"# app/models/model_catalog.py\nfrom typing import Dict, List, Any, Optional\n\nclass ModelCapability:\n    \"\"\"Represents the capabilities of a model.\"\"\"\n    def __init__(\n        self,\n        context_window: int,\n        strengths: List[str],\n        supports_tools: bool,\n        recommended_temperature: float,\n        approximate_speed: str  # \"fast\", \"medium\", \"slow\"\n    ):\n        self.context_window = context_window\n        self.strengths = strengths\n        self.supports_tools = supports_tools\n        self.recommended_temperature = recommended_temperature\n        self.approximate_speed = approximate_speed\n\n# Ollama model catalog\nOLLAMA_MODELS = {\n    \"llama2\": ModelCapability(\n        context_window=4096,\n        strengths=[\"general_knowledge\", \"reasoning\", \"instruction_following\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"llama2:13b\": ModelCapability(\n        context_window=4096,\n        strengths=[\"general_knowledge\", \"reasoning\", \"instruction_following\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"llama2:70b\": ModelCapability(\n        context_window=4096,\n        strengths=[\"general_knowledge\", \"reasoning\", \"instruction_following\"],\n        supports_tools=False,\n        recommended_temperature=0.65,\n        approximate_speed=\"slow\"\n    ),\n    \"mistral\": ModelCapability(\n        context_window=8192,\n        strengths=[\"instruction_following\", \"reasoning\", \"versatility\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"mistral:7b-instruct\": ModelCapability(\n        context_window=8192,\n        strengths=[\"instruction_following\", \"chat\", \"versatility\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"codellama\": ModelCapability(\n        context_window=16384,\n        strengths=[\"code_generation\", \"code_explanation\", \"technical_writing\"],\n        supports_tools=False,\n        recommended_temperature=0.5,\n        approximate_speed=\"medium\"\n    ),\n    \"codellama:34b\": ModelCapability(\n        context_window=16384,\n        strengths=[\"code_generation\", \"code_explanation\", \"technical_writing\"],\n        supports_tools=False,\n        recommended_temperature=0.5,\n        approximate_speed=\"slow\"\n    ),\n    \"dolphin-mistral\": ModelCapability(\n        context_window=8192,\n        strengths=[\"conversational\", \"creative\", \"helpfulness\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"neural-chat\": ModelCapability(\n        context_window=8192,\n        strengths=[\"conversational\", \"instruction_following\", \"helpfulness\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"orca-mini\": ModelCapability(\n        context_window=4096,\n        strengths=[\"efficiency\", \"general_knowledge\", \"basic_reasoning\"],\n        supports_tools=False,\n        recommended_temperature=0.8,\n        approximate_speed=\"fast\"\n    ),\n    \"vicuna\": ModelCapability(\n        context_window=4096,\n        strengths=[\"conversational\", \"instruction_following\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"medium\"\n    ),\n    \"wizard-math\": ModelCapability(\n        context_window=4096,\n        strengths=[\"mathematics\", \"problem_solving\", \"logical_reasoning\"],\n        supports_tools=False,\n        recommended_temperature=0.5,\n        approximate_speed=\"medium\"\n    ),\n    \"phi\": ModelCapability(\n        context_window=2048,\n        strengths=[\"efficiency\", \"basic_tasks\", \"lightweight\"],\n        supports_tools=False,\n        recommended_temperature=0.7,\n        approximate_speed=\"fast\"\n    )\n}\n\n# OpenAI -\u003e Ollama model mapping for fallback scenarios\nOPENAI_TO_OLLAMA_MAPPING = {\n    \"gpt-3.5-turbo\": \"llama2\",\n    \"gpt-3.5-turbo-16k\": \"mistral:7b-instruct\",\n    \"gpt-4\": \"llama2:70b\",\n    \"gpt-4o\": \"mistral\",\n    \"gpt-4-turbo\": \"mistral\",\n    \"code-llama\": \"codellama\"\n}\n\n# Use case to model recommendations\nUSE_CASE_RECOMMENDATIONS = {\n    \"code_generation\": [\"codellama:34b\", \"codellama\"],\n    \"creative_writing\": [\"dolphin-mistral\", \"mistral:7b-instruct\"],\n    \"mathematical_reasoning\": [\"wizard-math\", \"llama2:70b\"],\n    \"conversational\": [\"neural-chat\", \"dolphin-mistral\"],\n    \"knowledge_intensive\": [\"llama2:70b\", \"mistral\"],\n    \"resource_constrained\": [\"phi\", \"orca-mini\"]\n}\n\ndef recommend_ollama_model(use_case: str, performance_tier: str = \"medium\") -\u003e str:\n    \"\"\"Recommend an Ollama model based on use case and performance requirements.\"\"\"\n    if use_case in USE_CASE_RECOMMENDATIONS:\n        models = USE_CASE_RECOMMENDATIONS[use_case]\n        \n        # Filter by performance tier if needed\n        if performance_tier == \"high\":\n            for model in models:\n                if \":70b\" in model or \":34b\" in model:\n                    return model\n            return models[0]  # Return first if no high-tier match\n        elif performance_tier == \"low\":\n            return \"orca-mini\" if use_case != \"code_generation\" else \"codellama\"\n        else:  # medium tier\n            return models[0]\n    \n    # Default recommendations\n    if performance_tier == \"high\":\n        return \"llama2:70b\"\n    elif performance_tier == \"low\":\n        return \"orca-mini\"\n    else:\n        return \"mistral\"\n"])</script><script>self.__next_f.push([1,"99:[\"$\",\"pre\",\"pre-33\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e0\"}]}]\n9a:[\"$\",\"h2\",\"h2-34\",{\"id\":\"agent-adapter-for-model-selection\",\"children\":\"Agent Adapter for Model Selection\"}]\n2e1:T20b3,"])</script><script>self.__next_f.push([1,"# app/agents/adaptive_agent.py\nfrom typing import List, Dict, Any, Optional\nimport logging\nfrom app.agents.base_agent import BaseAgent\nfrom app.models.message import Message, MessageRole\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.models.model_catalog import recommend_ollama_model, OLLAMA_MODELS\n\nlogger = logging.getLogger(__name__)\n\nclass AdaptiveAgent(BaseAgent):\n    \"\"\"Agent that adapts its model selection based on task requirements.\"\"\"\n    \n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.last_used_model = None\n        self.last_used_provider = None\n        self.performance_metrics = {}\n    \n    async def _generate_response(self, user_id: str) -\u003e str:\n        \"\"\"Generate a response with dynamic model selection.\"\"\"\n        # Extract the last user message\n        last_user_message = next(\n            (msg for msg in reversed(self.state.conversation_history) \n             if msg.role == MessageRole.USER), \n            None\n        )\n        \n        if not last_user_message:\n            return \"I don't have any messages to respond to.\"\n        \n        # Analyze the message to determine the best model\n        provider, model = await self._select_optimal_model(last_user_message.content)\n        \n        logger.info(f\"Selected model for response: {provider}:{model}\")\n        \n        # Track the selected model for monitoring\n        self.last_used_model = model\n        self.last_used_provider = provider\n        \n        # Get model-specific parameters\n        params = self._get_model_parameters(provider, model)\n        \n        # Start timing for performance metrics\n        import time\n        start_time = time.time()\n        \n        # Generate the response\n        response = await self.provider_service.generate_completion(\n            messages=[msg.model_dump() for msg in self.state.conversation_history],\n            model=f\"{provider}:{model}\" if provider != \"auto\" else None,\n            provider=provider,\n            tools=self.tools,\n            temperature=params.get(\"temperature\", 0.7),\n            max_tokens=params.get(\"max_tokens\"),\n            user=user_id\n        )\n        \n        # Record performance metrics\n        execution_time = time.time() - start_time\n        self._update_performance_metrics(provider, model, execution_time, response)\n        \n        if response.get(\"tool_calls\"):\n            # Process tool calls if needed\n            # ... (tool call handling code)\n            pass\n        \n        return response[\"message\"][\"content\"]\n    \n    async def _select_optimal_model(self, message: str) -\u003e tuple[str, str]:\n        \"\"\"Select the optimal model based on message analysis.\"\"\"\n        # 1. Analyze for use case\n        use_case = await self._determine_use_case(message)\n        \n        # 2. Determine performance needs\n        performance_tier = self._determine_performance_tier(message)\n        \n        # 3. Check if tools are required\n        tools_required = len(self.tools) \u003e 0\n        \n        # 4. Check message complexity\n        is_complex = await self._is_complex_request(message)\n        \n        # Decision logic\n        if tools_required:\n            # OpenAI is better for tool usage\n            return \"openai\", \"gpt-4o\"\n        \n        if is_complex:\n            # For complex requests, prefer OpenAI or high-tier Ollama models\n            if performance_tier == \"high\":\n                return \"openai\", \"gpt-4o\"\n            else:\n                ollama_model = recommend_ollama_model(use_case, \"high\")\n                return \"ollama\", ollama_model\n        \n        # For standard requests, use Ollama with appropriate model\n        ollama_model = recommend_ollama_model(use_case, performance_tier)\n        return \"ollama\", ollama_model\n    \n    async def _determine_use_case(self, message: str) -\u003e str:\n        \"\"\"Determine the use case based on message content.\"\"\"\n        message_lower = message.lower()\n        \n        # Simple heuristic classification\n        if any(term in message_lower for term in [\"code\", \"program\", \"function\", \"class\", \"algorithm\"]):\n            return \"code_generation\"\n        \n        if any(term in message_lower for term in [\"story\", \"creative\", \"imagine\", \"write\", \"novel\"]):\n            return \"creative_writing\"\n        \n        if any(term in message_lower for term in [\"math\", \"calculate\", \"equation\", \"solve\", \"formula\"]):\n            return \"mathematical_reasoning\"\n        \n        if any(term in message_lower for term in [\"chat\", \"talk\", \"discuss\", \"conversation\"]):\n            return \"conversational\"\n        \n        if len(message.split()) \u003e 50 or any(term in message_lower for term in [\"explain\", \"detail\", \"analysis\"]):\n            return \"knowledge_intensive\"\n        \n        # Default to conversational\n        return \"conversational\"\n    \n    def _determine_performance_tier(self, message: str) -\u003e str:\n        \"\"\"Determine the performance tier needed based on message characteristics.\"\"\"\n        # Length-based heuristic\n        word_count = len(message.split())\n        \n        if word_count \u003e 100 or \"detailed\" in message.lower() or \"comprehensive\" in message.lower():\n            return \"high\"\n        \n        if word_count \u003c 20 and not any(term in message.lower() for term in [\"complex\", \"difficult\", \"advanced\"]):\n            return \"low\"\n        \n        return \"medium\"\n    \n    async def _is_complex_request(self, message: str) -\u003e bool:\n        \"\"\"Determine if this is a complex request requiring more powerful models.\"\"\"\n        # Check for indicators of complexity\n        complexity_indicators = [\n            \"complex\", \"detailed\", \"thorough\", \"comprehensive\", \"in-depth\",\n            \"analyze\", \"compare\", \"synthesize\", \"evaluate\", \"technical\",\n            \"step by step\", \"advanced\", \"sophisticated\", \"nuanced\"\n        ]\n        \n        indicator_count = sum(1 for indicator in complexity_indicators if indicator in message.lower())\n        \n        # Length is also an indicator of complexity\n        is_long = len(message.split()) \u003e 50\n        \n        # Multiple questions indicate complexity\n        question_count = message.count(\"?\")\n        has_multiple_questions = question_count \u003e 1\n        \n        return (indicator_count \u003e= 2) or (is_long and indicator_count \u003e= 1) or has_multiple_questions\n    \n    def _get_model_parameters(self, provider: str, model: str) -\u003e Dict[str, Any]:\n        \"\"\"Get model-specific parameters.\"\"\"\n        if provider == \"ollama\":\n            if model in OLLAMA_MODELS:\n                capabilities = OLLAMA_MODELS[model]\n                return {\n                    \"temperature\": capabilities.recommended_temperature,\n                    \"max_tokens\": capabilities.context_window // 2  # Conservative estimate\n                }\n            else:\n                # Default Ollama parameters\n                return {\"temperature\": 0.7, \"max_tokens\": 2048}\n        else:\n            # OpenAI models\n            if \"gpt-4\" in model:\n                return {\"temperature\": 0.7, \"max_tokens\": 4096}\n            else:\n                return {\"temperature\": 0.7, \"max_tokens\": 2048}\n    \n    def _update_performance_metrics(\n        self, \n        provider: str, \n        model: str, \n        execution_time: float,\n        response: Dict[str, Any]\n    ):\n        \"\"\"Update performance metrics for this model.\"\"\"\n        model_key = f\"{provider}:{model}\"\n        \n        if model_key not in self.performance_metrics:\n            self.performance_metrics[model_key] = {\n                \"calls\": 0,\n                \"total_time\": 0,\n                \"avg_time\": 0,\n                \"token_usage\": {\n                    \"prompt\": 0,\n                    \"completion\": 0,\n                    \"total\": 0\n                }\n            }\n        \n        metrics = self.performance_metrics[model_key]\n        metrics[\"calls\"] += 1\n        metrics[\"total_time\"] += execution_time\n        metrics[\"avg_time\"] = metrics[\"total_time\"] / metrics[\"calls\"]\n        \n        # Update token usage if available\n        if \"usage\" in response:\n            usage = response[\"usage\"]\n            metrics[\"token_usage\"][\"prompt\"] += usage.get(\"prompt_tokens\", 0)\n            metrics[\"token_usage\"][\"completion\"] += usage.get(\"completion_tokens\", 0)\n            metrics[\"token_usage\"][\"total\"] += usage.get(\"total_tokens\", 0)\n"])</script><script>self.__next_f.push([1,"9b:[\"$\",\"pre\",\"pre-34\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e1\"}]}]\n9c:[\"$\",\"h2\",\"h2-35\",{\"id\":\"agent-controller-with-model-selection\",\"children\":\"Agent Controller with Model Selection\"}]\n2e2:T180e,"])</script><script>self.__next_f.push([1,"# app/controllers/agent_controller.py\nfrom fastapi import APIRouter, Depends, HTTPException, Query, BackgroundTasks\nfrom pydantic import BaseModel, Field\nfrom typing import List, Dict, Any, Optional\nimport logging\n\nfrom app.agents.agent_factory import AgentFactory\nfrom app.agents.adaptive_agent import AdaptiveAgent\nfrom app.services.provider_service import Provider\nfrom app.services.auth_service import get_current_user\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nrouter = APIRouter(prefix=\"/api/v1/agents\", tags=[\"agents\"])\n\nclass ModelSelectionParams(BaseModel):\n    \"\"\"Parameters for model selection.\"\"\"\n    provider: Optional[str] = Field(None, description=\"Provider to use (openai, ollama, auto)\")\n    model: Optional[str] = Field(None, description=\"Specific model to use\")\n    auto_select: bool = Field(True, description=\"Whether to auto-select the optimal model\")\n    use_case: Optional[str] = Field(None, description=\"Specific use case for model recommendation\")\n    performance_tier: Optional[str] = Field(\"medium\", description=\"Performance tier (low, medium, high)\")\n\nclass ChatRequest(BaseModel):\n    message: str\n    session_id: Optional[str] = None\n    model_params: Optional[ModelSelectionParams] = None\n    stream: bool = False\n\nclass ChatResponse(BaseModel):\n    response: str\n    session_id: str\n    model_used: str\n    provider_used: str\n    execution_metrics: Optional[Dict[str, Any]] = None\n\n# Agent sessions storage\nagent_sessions = {}\n\n# Get agent factory instance\nagent_factory = Depends(lambda: get_agent_factory())\n\ndef get_agent_factory():\n    # Initialize and return agent factory\n    # In a real implementation, this would be properly initialized\n    return AgentFactory()\n\n@router.post(\"/chat\", response_model=ChatResponse)\nasync def chat(\n    request: ChatRequest,\n    background_tasks: BackgroundTasks,\n    current_user: Dict = Depends(get_current_user),\n    factory: AgentFactory = agent_factory\n):\n    \"\"\"Chat with an agent that intelligently selects the appropriate model.\"\"\"\n    user_id = current_user[\"id\"]\n    \n    # Create or retrieve session\n    session_id = request.session_id\n    if not session_id or session_id not in agent_sessions:\n        # Create a new adaptive agent\n        agent = factory.create_agent(\n            agent_type=\"adaptive\",\n            agent_class=AdaptiveAgent,\n            system_prompt=\"You are a helpful assistant that provides accurate, relevant information.\"\n        )\n        \n        session_id = f\"session_{user_id}_{len(agent_sessions) + 1}\"\n        agent_sessions[session_id] = agent\n    else:\n        agent = agent_sessions[session_id]\n    \n    # Apply model selection parameters if provided\n    if request.model_params:\n        if not request.model_params.auto_select:\n            # Force specific provider/model\n            provider = request.model_params.provider or \"auto\"\n            model = request.model_params.model\n            \n            if provider != \"auto\" and model:\n                logger.info(f\"Forcing model selection: {provider}:{model}\")\n                # Set for next generation\n                agent.last_used_provider = provider\n                agent.last_used_model = model\n    \n    try:\n        # Process the message\n        if request.stream:\n            # Implement streaming logic if needed\n            pass\n        else:\n            response = await agent.process_message(request.message, user_id)\n            \n            # Get the model and provider that were used\n            model_used = agent.last_used_model or \"unknown\"\n            provider_used = agent.last_used_provider or \"unknown\"\n            \n            # Get execution metrics\n            model_key = f\"{provider_used}:{model_used}\"\n            execution_metrics = agent.performance_metrics.get(model_key)\n            \n            # Schedule background task to analyze performance and adjust preferences\n            background_tasks.add_task(\n                analyze_performance, \n                agent, \n                model_key, \n                execution_metrics\n            )\n            \n            return ChatResponse(\n                response=response,\n                session_id=session_id,\n                model_used=model_used,\n                provider_used=provider_used,\n                execution_metrics=execution_metrics\n            )\n    except Exception as e:\n        logger.exception(f\"Error processing message: {str(e)}\")\n        raise HTTPException(status_code=500, detail=f\"Error processing message: {str(e)}\")\n\n@router.get(\"/models/recommend\")\nasync def recommend_model(\n    use_case: str = Query(..., description=\"The use case (code_generation, creative_writing, etc.)\"),\n    performance_tier: str = Query(\"medium\", description=\"Performance tier (low, medium, high)\"),\n    current_user: Dict = Depends(get_current_user)\n):\n    \"\"\"Get model recommendations for a specific use case.\"\"\"\n    from app.models.model_catalog import recommend_ollama_model, OLLAMA_MODELS\n    \n    # Get recommended Ollama model\n    recommended_model = recommend_ollama_model(use_case, performance_tier)\n    \n    # Get OpenAI equivalent\n    openai_equivalent = \"gpt-4o\" if performance_tier == \"high\" else \"gpt-3.5-turbo\"\n    \n    # Get model capabilities if available\n    capabilities = OLLAMA_MODELS.get(recommended_model, {})\n    \n    return {\n        \"ollama_recommendation\": recommended_model,\n        \"openai_recommendation\": openai_equivalent,\n        \"capabilities\": capabilities,\n        \"use_case\": use_case,\n        \"performance_tier\": performance_tier\n    }\n\nasync def analyze_performance(agent, model_key, metrics):\n    \"\"\"Analyze model performance and adjust preferences.\"\"\"\n    if not metrics or metrics[\"calls\"] \u003c 5:\n        # Not enough data to analyze\n        return\n    \n    # Analyze average response time\n    avg_time = metrics[\"avg_time\"]\n    \n    # If response time is too slow, consider adjusting default models\n    if avg_time \u003e 5.0:  # More than 5 seconds\n        logger.info(f\"Model {model_key} showing slow performance: {avg_time}s avg\")\n        \n        # In a real implementation, we might adjust preferred models here\n        pass\n"])</script><script>self.__next_f.push([1,"9d:[\"$\",\"pre\",\"pre-35\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e2\"}]}]\n9e:[\"$\",\"h2\",\"h2-36\",{\"id\":\"dockerfile-for-local-deployment\",\"children\":\"Dockerfile for Local Deployment\"}]\n9f:[\"$\",\"pre\",\"pre-36\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-dockerfile\",\"children\":\"# Dockerfile\\nFROM python:3.11-slim\\n\\nWORKDIR /app\\n\\n# Install system dependencies\\nRUN apt-get update \u0026\u0026 apt-get install -y --no-install-recommends \\\\\\n    curl \\\\\\n    \u0026\u0026 rm -rf /var/lib/apt/lists/*\\n\\n# Copy requirements\\nCOPY requirements.txt .\\nRUN pip install --no-cache-dir -r requirements.txt\\n\\n# Copy application code\\nCOPY . .\\n\\n# Set up environment\\nENV PYTHONPATH=/app\\nENV OPENAI_API_KEY=\\\"your-api-key-here\\\"\\nENV OLLAMA_HOST=\\\"http://ollama:11434\\\"\\nENV OLLAMA_MODEL=\\\"llama2\\\"\\n\\n# Default command\\nCMD [\\\"uvicorn\\\", \\\"app.main:app\\\", \\\"--host\\\", \\\"0.0.0.0\\\", \\\"--port\\\", \\\"8000\\\"]\\n\"}]}]\na0:[\"$\",\"h2\",\"h2-37\",{\"id\":\"docker-compose-for-development\",\"children\":\"Docker Compose for Development\"}]\na1:[\"$\",\"pre\",\"pre-37\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"# docker-compose.yml\\nversion: '3.8'\\n\\nservices:\\n  app:\\n    build: .\\n    ports:\\n      - \\\"8000:8000\\\"\\n    volumes:\\n      - .:/app\\n    environment:\\n      - OLLAMA_HOST=http://ollama:11434\\n      - OPENAI_API_KEY=${OPENAI_API_KEY}\\n      - OPENAI_MODEL=${OPENAI_MODEL:-gpt-4o}\\n      - OLLAMA_MODEL=${OLLAMA_MODEL:-llama2}\\n    depends_on:\\n      - ollama\\n    restart: unless-stopped\\n\\n  ollama:\\n    image: ollama/ollama:latest\\n    volumes:\\n      - ollama_data:/root/.ollama\\n    ports:\\n      - \\\"11434:11434\\\"\\n    deploy:\\n      resources:\\n        reservations:\\n          devices:\\n            - driver: nvidia\\n              count: all\\n              capabilities: [gpu]\\n\\nvolumes:\\n  ollama_data:\\n\"}]}]\na2:[\"$\",\"h2\",\"h2-38\",{\"id\":\"model-preload-script\",\"children\":\"Model Preload Script\"}]\n2e3:Ta7a,"])</script><script>self.__next_f.push([1,"# scripts/preload_models.py\n#!/usr/bin/env python\nimport argparse\nimport requests\nimport time\nimport sys\nimport os\nfrom typing import List, Dict\n\ndef main():\n    parser = argparse.ArgumentParser(description='Preload Ollama models')\n    parser.add_argument('--host', default=\"http://localhost:11434\", help='Ollama host URL')\n    parser.add_argument('--models', default=\"llama2,mistral,codellama\", help='Comma-separated list of models to preload')\n    parser.add_argument('--timeout', type=int, default=3600, help='Timeout in seconds for each model pull')\n    args = parser.parse_args()\n\n    models = [m.strip() for m in args.models.split(',')]\n    preload_models(args.host, models, args.timeout)\n\ndef preload_models(host: str, models: List[str], timeout: int):\n    \"\"\"Preload models into Ollama.\"\"\"\n    print(f\"Preloading {len(models)} models on {host}...\")\n    \n    # Check Ollama availability\n    try:\n        response = requests.get(f\"{host}/api/tags\")\n        if response.status_code != 200:\n            print(f\"Error connecting to Ollama: Status {response.status_code}\")\n            sys.exit(1)\n            \n        available_models = [m[\"name\"] for m in response.json().get(\"models\", [])]\n        print(f\"Currently available models: {', '.join(available_models)}\")\n    except Exception as e:\n        print(f\"Error connecting to Ollama: {str(e)}\")\n        sys.exit(1)\n    \n    # Pull each model\n    for model in models:\n        if model in available_models:\n            print(f\"Model {model} is already available, skipping...\")\n            continue\n            \n        print(f\"Pulling model: {model}\")\n        try:\n            start_time = time.time()\n            response = requests.post(\n                f\"{host}/api/pull\", \n                json={\"name\": model},\n                timeout=timeout\n            )\n            \n            if response.status_code != 200:\n                print(f\"Error pulling model {model}: Status {response.status_code}\")\n                print(response.text)\n                continue\n                \n            elapsed = time.time() - start_time\n            print(f\"Successfully pulled {model} in {elapsed:.1f} seconds\")\n        except Exception as e:\n            print(f\"Error pulling model {model}: {str(e)}\")\n    \n    # Verify available models after pulling\n    try:\n        response = requests.get(f\"{host}/api/tags\")\n        if response.status_code == 200:\n            available_models = [m[\"name\"] for m in response.json().get(\"models\", [])]\n            print(f\"Available models: {', '.join(available_models)}\")\n    except Exception as e:\n        print(f\"Error checking available models: {str(e)}\")\n\nif __name__ == \"__main__\":\n    main()\n"])</script><script>self.__next_f.push([1,"a3:[\"$\",\"pre\",\"pre-38\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e3\"}]}]\na4:[\"$\",\"h2\",\"h2-39\",{\"id\":\"implementation-guide\",\"children\":\"Implementation Guide\"}]\na5:[\"$\",\"h3\",\"h3-25\",{\"id\":\"setting-up-ollama\",\"children\":\"Setting up Ollama\"}]\na6:[\"$\",\"ol\",\"ol-6\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Installation:\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# macOS\\nbrew install ollama\\n\\n# Linux\\ncurl -fsSL https://ollama.com/install.sh | sh\\n\\n# Windows\\n# Download from https://ollama.com/download/windows\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Pull Base Models:\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama pull llama2\\nollama pull mistral\\nollama pull codellama\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Start Ollama Server:\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama serve\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\na7:[\"$\",\"h3\",\"h3-26\",{\"id\":\"application-configuration\",\"children\":\"Application Configuration\"}]\n"])</script><script>self.__next_f.push([1,"a8:[\"$\",\"ol\",\"ol-7\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Create .env file:\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OPENAI_API_KEY=sk-...\\nOPENAI_ORG_ID=org-...  # Optional\\nOPENAI_MODEL=gpt-4o\\nOLLAMA_MODEL=llama2\\nOLLAMA_HOST=http://localhost:11434\\nCOMPLEXITY_THRESHOLD=0.65\\nPRIVACY_SENSITIVE_TOKENS=password,secret,token,key,credential\\n\"}],\"position\":{\"start\":{\"line\":3958,\"column\":4,\"offset\":149432},\"end\":{\"line\":3966,\"column\":7,\"offset\":149682}}},\"children\":\"OPENAI_API_KEY=sk-...\\nOPENAI_ORG_ID=org-...  # Optional\\nOPENAI_MODEL=gpt-4o\\nOLLAMA_MODEL=llama2\\nOLLAMA_HOST=http://localhost:11434\\nCOMPLEXITY_THRESHOLD=0.65\\nPRIVACY_SENSITIVE_TOKENS=password,secret,token,key,credential\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Initialize Application:\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Install dependencies\\npip install -r requirements.txt\\n\\n# Start the application\\nuvicorn app.main:app --reload\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"a9:[\"$\",\"h3\",\"h3-27\",{\"id\":\"model-selection-criteria\",\"children\":\"Model Selection Criteria\"}]\naa:[\"$\",\"p\",\"p-31\",{\"children\":\"The system determines which provider (OpenAI or Ollama) to use based on several criteria:\"}]\n"])</script><script>self.__next_f.push([1,"ab:[\"$\",\"ol\",\"ol-8\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Complexity Analysis\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Messages are analyzed for complexity based on length, specialized terminology, and sentence structure.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"The \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"COMPLEXITY_THRESHOLD\",\"position\":{\"start\":{\"line\":3983,\"column\":10,\"offset\":150122},\"end\":{\"line\":3983,\"column\":32,\"offset\":150144}}}],\"position\":{\"start\":{\"line\":3983,\"column\":10,\"offset\":150122},\"end\":{\"line\":3983,\"column\":32,\"offset\":150144}}},\"children\":\"COMPLEXITY_THRESHOLD\"}],\" setting (default: 0.65) determines when to route to OpenAI for more complex queries.\"]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Privacy Concerns\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"Messages containing sensitive terms (configured in \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"PRIVACY_SENSITIVE_TOKENS\",\"position\":{\"start\":{\"line\":3986,\"column\":57,\"offset\":150312},\"end\":{\"line\":3986,\"column\":83,\"offset\":150338}}}],\"position\":{\"start\":{\"line\":3986,\"column\":57,\"offset\":150312},\"end\":{\"line\":3986,\"column\":83,\"offset\":150338}}},\"children\":\"PRIVACY_SENSITIVE_TOKENS\"}],\") are preferentially routed to Ollama.\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"This ensures sensitive information remains on local infrastructure.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Tool Requirements\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Requests requiring tools/functions are routed to OpenAI as Ollama has limited native tool support.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"The system simulates tool usage in Ollama using prompt engineering when necessary.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Resource Constraints\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Token budget constraints can trigger routing to OpenAI for longer conversations.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Local hardware capabilities are considered when selecting Ollama models.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"ac:[\"$\",\"h3\",\"h3-28\",{\"id\":\"ollama-model-selection\",\"children\":\"Ollama Model Selection\"}]\nad:[\"$\",\"p\",\"p-32\",{\"children\":\"The system intelligently selects the appropriate Ollama model based on the query's requirements:\"}]\n"])</script><script>self.__next_f.push([1,"ae:[\"$\",\"ol\",\"ol-9\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"For code generation\"}],\": \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"codellama\",\"position\":{\"start\":{\"line\":4001,\"column\":29,\"offset\":151018},\"end\":{\"line\":4001,\"column\":40,\"offset\":151029}}}],\"position\":{\"start\":{\"line\":4001,\"column\":29,\"offset\":151018},\"end\":{\"line\":4001,\"column\":40,\"offset\":151029}}},\"children\":\"codellama\"}],\" (default) or \",[\"$\",\"code\",\"code-1\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"codellama:34b\",\"position\":{\"start\":{\"line\":4001,\"column\":54,\"offset\":151043},\"end\":{\"line\":4001,\"column\":69,\"offset\":151058}}}],\"position\":{\"start\":{\"line\":4001,\"column\":54,\"offset\":151043},\"end\":{\"line\":4001,\"column\":69,\"offset\":151058}}},\"children\":\"codellama:34b\"}],\" (high performance)\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"For creative tasks\"}],\": \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"dolphin-mistral\",\"position\":{\"start\":{\"line\":4002,\"column\":28,\"offset\":151105},\"end\":{\"line\":4002,\"column\":45,\"offset\":151122}}}],\"position\":{\"start\":{\"line\":4002,\"column\":28,\"offset\":151105},\"end\":{\"line\":4002,\"column\":45,\"offset\":151122}}},\"children\":\"dolphin-mistral\"}],\" or \",[\"$\",\"code\",\"code-1\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"neural-chat\",\"position\":{\"start\":{\"line\":4002,\"column\":49,\"offset\":151126},\"end\":{\"line\":4002,\"column\":62,\"offset\":151139}}}],\"position\":{\"start\":{\"line\":4002,\"column\":49,\"offset\":151126},\"end\":{\"line\":4002,\"column\":62,\"offset\":151139}}},\"children\":\"neural-chat\"}]]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"For mathematical reasoning\"}],\": \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"wizard-math\",\"position\":{\"start\":{\"line\":4003,\"column\":36,\"offset\":151175},\"end\":{\"line\":4003,\"column\":49,\"offset\":151188}}}],\"position\":{\"start\":{\"line\":4003,\"column\":36,\"offset\":151175},\"end\":{\"line\":4003,\"column\":49,\"offset\":151188}}},\"children\":\"wizard-math\"}]]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"For general knowledge\"}],\": \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"llama2\",\"position\":{\"start\":{\"line\":4004,\"column\":31,\"offset\":151219},\"end\":{\"line\":4004,\"column\":39,\"offset\":151227}}}],\"position\":{\"start\":{\"line\":4004,\"column\":31,\"offset\":151219},\"end\":{\"line\":4004,\"column\":39,\"offset\":151227}}},\"children\":\"llama2\"}],\" (base), \",[\"$\",\"code\",\"code-1\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"llama2:13b\",\"position\":{\"start\":{\"line\":4004,\"column\":48,\"offset\":151236},\"end\":{\"line\":4004,\"column\":60,\"offset\":151248}}}],\"position\":{\"start\":{\"line\":4004,\"column\":48,\"offset\":151236},\"end\":{\"line\":4004,\"column\":60,\"offset\":151248}}},\"children\":\"llama2:13b\"}],\" (medium), or \",[\"$\",\"code\",\"code-2\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"llama2:70b\",\"position\":{\"start\":{\"line\":4004,\"column\":74,\"offset\":151262},\"end\":{\"line\":4004,\"column\":86,\"offset\":151274}}}],\"position\":{\"start\":{\"line\":4004,\"column\":74,\"offset\":151262},\"end\":{\"line\":4004,\"column\":86,\"offset\":151274}}},\"children\":\"llama2:70b\"}],\" (high performance)\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"For resource-constrained environments\"}],\": \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"phi\",\"position\":{\"start\":{\"line\":4005,\"column\":47,\"offset\":151340},\"end\":{\"line\":4005,\"column\":52,\"offset\":151345}}}],\"position\":{\"start\":{\"line\":4005,\"column\":47,\"offset\":151340},\"end\":{\"line\":4005,\"column\":52,\"offset\":151345}}},\"children\":\"phi\"}],\" or \",[\"$\",\"code\",\"code-1\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"orca-mini\",\"position\":{\"start\":{\"line\":4005,\"column\":56,\"offset\":151349},\"end\":{\"line\":4005,\"column\":67,\"offset\":151360}}}],\"position\":{\"start\":{\"line\":4005,\"column\":56,\"offset\":151349},\"end\":{\"line\":4005,\"column\":67,\"offset\":151360}}},\"children\":\"orca-mini\"}]]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"af:[\"$\",\"h3\",\"h3-29\",{\"id\":\"performance-optimization\",\"children\":\"Performance Optimization\"}]\nb0:[\"$\",\"ol\",\"ol-10\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response Caching\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Common responses are cached to improve performance.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Cache TTL and maximum items are configurable.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Dynamic Temperature Adjustment\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Each model has recommended temperature settings for optimal performance.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"The system adjusts temperature based on the task type.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Adaptive Routing\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"The system learns from performance metrics and adjusts routing preferences over time.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Models with consistently poor performance receive fewer requests.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\nb1:[\"$\",\"h3\",\"h3-30\",{\"id\":\"fallback-mechanisms\",\"children\":\"Fallback Mechanisms\"}]\nb2:[\"$\",\"p\",\"p-33\",{\"children\":\"The system implements robust fallback mechanisms:\"}]\nb3:[\"$\",\"ol\",\"ol-11\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Provider Fallback\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"If OpenAI is unavailable, the system falls back to Ollama.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"If Ollama fails, the system falls back to OpenAI.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Model Fallback\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"If a requested model is unavailable, the system selects an appropriate alternative.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Fallback chains are configured for each model to ensure graceful degradation.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Error Handling\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Network errors, timeout issues, and model limitations are handled gracefully.\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"The system provides informative error messages when fallbacks are exhausted.\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\nb4:[\"$\",\"h2\",\"h2-40\",{\"id\":\"conclusion-3\",\"children\":\"Conclusion\"}]\nb5:[\"$\",\"p\",\"p-34\",{\"children\":\"The integration of Ollama with OpenAI's Agent SDK creates a sophisticated hybrid architecture that combines the strengths of both local and cloud-based inference. This implementation provides:\"}]\nb6:[\"$\",\"ol\",\"ol-12\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Enhanced privacy\"}],\" by keeping sensitive information local when appropriate\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Cost optimization\"}],\" by routing suitable queries to local infrastructure\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Robust fallbacks\"}],\" ensuring system resilience against failures\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Task-appropriate model selection\"}],\" based on sophisticated analysis\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Seamless integration\"}],\" with the agent framework and tools ecosystem\"]}],\"\\n\"]}]\nb7:[\"$\",\"p\",\"p-35\",{\"children\":\"This architecture represents a significant advancement in responsible AI deployment, balancing the power of cloud-based models with the privacy and cost benefits of local inference. By intelligently routing requests based on their characteristics, the system "])</script><script>self.__next_f.push([1,"provides optimal performance while respecting critical constraints around privacy, latency, and resource utilization.\"}]\nb8:[\"$\",\"h1\",\"h1-5\",{\"id\":\"comprehensive-testing-strategy-for-openai-ollama-hybrid-agent-system\",\"children\":\"Comprehensive Testing Strategy for OpenAI-Ollama Hybrid Agent System\"}]\nb9:[\"$\",\"h2\",\"h2-41\",{\"id\":\"theoretical-framework-for-validation-methodology\",\"children\":\"Theoretical Framework for Validation Methodology\"}]\nba:[\"$\",\"p\",\"p-36\",{\"children\":\"The integration of cloud-based and local inferencing capabilities within a unified agent architecture necessitates a multifaceted testing approach that encompasses both individual components and their systemic interactions. This document establishes a rigorous testing framework that addresses the unique challenges of validating a hybrid AI system across multiple dimensions of functionality, performance, and reliability.\"}]\nbb:[\"$\",\"h2\",\"h2-42\",{\"id\":\"strategic-testing-layers\",\"children\":\"Strategic Testing Layers\"}]\nbc:[\"$\",\"h3\",\"h3-31\",{\"id\":\"1-unit-testing-framework\",\"children\":\"1. Unit Testing Framework\"}]\nbd:[\"$\",\"h4\",\"h4-2\",{\"id\":\"core-component-isolation-testing\",\"children\":\"Core Component Isolation Testing\"}]\n2e4:T1654,"])</script><script>self.__next_f.push([1,"# tests/unit/test_provider_service.py\nimport pytest\nimport asyncio\nfrom unittest.mock import AsyncMock, patch, MagicMock\nimport json\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\nclass TestProviderService:\n    @pytest.fixture\n    def provider_service(self):\n        \"\"\"Create a provider service with mocked dependencies for testing.\"\"\"\n        service = ProviderService()\n        service.openai_client = AsyncMock()\n        service.ollama_service = AsyncMock(spec=OllamaService)\n        return service\n    \n    @pytest.mark.asyncio\n    async def test_select_provider_and_model_explicit(self, provider_service):\n        \"\"\"Test explicit provider and model selection.\"\"\"\n        # Test explicit provider:model format\n        provider, model = await provider_service._select_provider_and_model(\n            messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n            model=\"openai:gpt-4\"\n        )\n        assert provider == Provider.OPENAI\n        assert model == \"gpt-4\"\n        \n        # Test explicit provider with default model\n        provider, model = await provider_service._select_provider_and_model(\n            messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n            provider=\"ollama\"\n        )\n        assert provider == Provider.OLLAMA\n        assert model == provider_service.default_ollama_model\n    \n    @pytest.mark.asyncio\n    async def test_auto_routing_complex_content(self, provider_service):\n        \"\"\"Test auto-routing with complex content.\"\"\"\n        # Mock complexity assessment to return high complexity\n        provider_service._assess_complexity = AsyncMock(return_value=0.8)\n        provider_service.model_selection_criteria.complexity_threshold = 0.7\n        \n        provider = await provider_service._auto_route(\n            messages=[{\"role\": \"user\", \"content\": \"Complex technical question\"}]\n        )\n        \n        assert provider == Provider.OPENAI\n        provider_service._assess_complexity.assert_called_once()\n    \n    @pytest.mark.asyncio\n    async def test_auto_routing_privacy_sensitive(self, provider_service):\n        \"\"\"Test auto-routing with privacy sensitive content.\"\"\"\n        provider_service.model_selection_criteria.privacy_sensitive_tokens = [\"password\", \"secret\"]\n        \n        provider = await provider_service._auto_route(\n            messages=[{\"role\": \"user\", \"content\": \"What is my password?\"}]\n        )\n        \n        assert provider == Provider.OLLAMA\n    \n    @pytest.mark.asyncio\n    async def test_auto_routing_with_tools(self, provider_service):\n        \"\"\"Test auto-routing with tool requirements.\"\"\"\n        provider = await provider_service._auto_route(\n            messages=[{\"role\": \"user\", \"content\": \"Simple question\"}],\n            tools=[{\"type\": \"function\", \"function\": {\"name\": \"get_weather\"}}]\n        )\n        \n        assert provider == Provider.OPENAI\n    \n    @pytest.mark.asyncio\n    async def test_generate_completion_openai(self, provider_service):\n        \"\"\"Test generating completion with OpenAI.\"\"\"\n        # Setup mock response\n        mock_response = MagicMock()\n        mock_response.model_dump.return_value = {\n            \"id\": \"test-id\",\n            \"object\": \"chat.completion\",\n            \"model\": \"gpt-4\",\n            \"usage\": {\"total_tokens\": 10},\n            \"message\": {\"content\": \"Test response\"}\n        }\n        provider_service.openai_client.chat.completions.create = AsyncMock(return_value=mock_response)\n        \n        response = await provider_service._generate_openai_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n            model=\"gpt-4\"\n        )\n        \n        assert response[\"message\"][\"content\"] == \"Test response\"\n        provider_service.openai_client.chat.completions.create.assert_called_once()\n    \n    @pytest.mark.asyncio\n    async def test_generate_completion_ollama(self, provider_service):\n        \"\"\"Test generating completion with Ollama.\"\"\"\n        provider_service.ollama_service.generate_completion.return_value = {\n            \"id\": \"ollama-test\",\n            \"model\": \"llama2\",\n            \"provider\": \"ollama\",\n            \"message\": {\"content\": \"Ollama response\"}\n        }\n        \n        response = await provider_service._generate_ollama_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n            model=\"llama2\"\n        )\n        \n        assert response[\"message\"][\"content\"] == \"Ollama response\"\n        provider_service.ollama_service.generate_completion.assert_called_once()\n    \n    @pytest.mark.asyncio\n    async def test_fallback_mechanism(self, provider_service):\n        \"\"\"Test fallback mechanism when primary provider fails.\"\"\"\n        # Mock the primary provider (OpenAI) to fail\n        provider_service._generate_openai_completion = AsyncMock(side_effect=Exception(\"API error\"))\n        \n        # Mock the fallback provider (Ollama) to succeed\n        provider_service._generate_ollama_completion = AsyncMock(return_value={\n            \"id\": \"ollama-fallback\",\n            \"provider\": \"ollama\",\n            \"message\": {\"content\": \"Fallback response\"}\n        })\n        \n        # Test the generate_completion method with auto provider\n        response = await provider_service.generate_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Hello\"}],\n            provider=\"auto\"\n        )\n        \n        # Check that fallback was used\n        assert response[\"provider\"] == \"ollama\"\n        assert response[\"message\"][\"content\"] == \"Fallback response\"\n        provider_service._generate_openai_completion.assert_called_once()\n        provider_service._generate_ollama_completion.assert_called_once()\n"])</script><script>self.__next_f.push([1,"be:[\"$\",\"pre\",\"pre-39\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e4\"}]}]\nbf:[\"$\",\"h4\",\"h4-3\",{\"id\":\"model-selection-logic-testing\",\"children\":\"Model Selection Logic Testing\"}]\n2e5:Tc4c,"])</script><script>self.__next_f.push([1,"# tests/unit/test_model_selection.py\nimport pytest\nfrom unittest.mock import AsyncMock, patch\nimport json\n\nfrom app.models.model_catalog import recommend_ollama_model, OLLAMA_MODELS\nfrom app.agents.adaptive_agent import AdaptiveAgent\n\nclass TestModelSelection:\n    @pytest.mark.parametrize(\"use_case,performance_tier,expected_model\", [\n        (\"code_generation\", \"high\", \"codellama:34b\"),\n        (\"creative_writing\", \"medium\", \"dolphin-mistral\"),\n        (\"mathematical_reasoning\", \"low\", \"orca-mini\"),\n        (\"conversational\", \"high\", \"neural-chat\"),\n        (\"knowledge_intensive\", \"high\", \"llama2:70b\"),\n        (\"resource_constrained\", \"low\", \"phi\"),\n    ])\n    def test_model_recommendations(self, use_case, performance_tier, expected_model):\n        \"\"\"Test model recommendation logic for different use cases.\"\"\"\n        model = recommend_ollama_model(use_case, performance_tier)\n        assert model == expected_model\n    \n    @pytest.mark.asyncio\n    async def test_adaptive_agent_use_case_detection(self):\n        \"\"\"Test adaptive agent's use case detection logic.\"\"\"\n        provider_service = AsyncMock()\n        agent = AdaptiveAgent(\n            provider_service=provider_service,\n            system_prompt=\"You are a helpful assistant.\"\n        )\n        \n        # Test code-related message\n        code_use_case = await agent._determine_use_case(\n            \"Can you help me write a Python function to calculate Fibonacci numbers?\"\n        )\n        assert code_use_case == \"code_generation\"\n        \n        # Test creative writing message\n        creative_use_case = await agent._determine_use_case(\n            \"Write a short story about a robot discovering emotions.\"\n        )\n        assert creative_use_case == \"creative_writing\"\n        \n        # Test mathematical reasoning message\n        math_use_case = await agent._determine_use_case(\n            \"Solve this equation: 3x² + 2x - 5 = 0\"\n        )\n        assert math_use_case == \"mathematical_reasoning\"\n    \n    @pytest.mark.asyncio\n    async def test_complexity_assessment(self):\n        \"\"\"Test complexity assessment logic.\"\"\"\n        provider_service = AsyncMock()\n        agent = AdaptiveAgent(\n            provider_service=provider_service,\n            system_prompt=\"You are a helpful assistant.\"\n        )\n        \n        # Simple message\n        simple_message = \"What time is it?\"\n        is_complex_simple = await agent._is_complex_request(simple_message)\n        assert not is_complex_simple\n        \n        # Complex message\n        complex_message = \"Can you provide a detailed analysis of the socioeconomic factors that contributed to the Industrial Revolution in England, and compare those with the conditions in contemporary developing economies?\"\n        is_complex_detailed = await agent._is_complex_request(complex_message)\n        assert is_complex_detailed\n        \n        # Multiple questions\n        multi_question = \"What is quantum computing? How does it differ from classical computing? What are its potential applications?\"\n        is_complex_multi = await agent._is_complex_request(multi_question)\n        assert is_complex_multi\n"])</script><script>self.__next_f.push([1,"c0:[\"$\",\"pre\",\"pre-40\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e5\"}]}]\nc1:[\"$\",\"h4\",\"h4-4\",{\"id\":\"ollama-service-testing\",\"children\":\"Ollama Service Testing\"}]\n2e6:T12ec,"])</script><script>self.__next_f.push([1,"# tests/unit/test_ollama_service.py\nimport pytest\nimport json\nimport asyncio\nfrom unittest.mock import AsyncMock, patch, MagicMock\n\nfrom app.services.ollama_service import OllamaService\n\nclass TestOllamaService:\n    @pytest.fixture\n    def ollama_service(self):\n        \"\"\"Create an Ollama service with mocked session for testing.\"\"\"\n        service = OllamaService()\n        service.session = AsyncMock()\n        return service\n    \n    @pytest.mark.asyncio\n    async def test_list_models(self, ollama_service):\n        \"\"\"Test listing available models.\"\"\"\n        mock_response = AsyncMock()\n        mock_response.status = 200\n        mock_response.json = AsyncMock(return_value={\"models\": [\n            {\"name\": \"llama2\"},\n            {\"name\": \"mistral\"}\n        ]})\n        \n        # Mock the context manager\n        ollama_service.session.get = AsyncMock()\n        ollama_service.session.get.return_value.__aenter__.return_value = mock_response\n        \n        models = await ollama_service.list_models()\n        \n        assert len(models) == 2\n        assert models[0][\"name\"] == \"llama2\"\n        assert models[1][\"name\"] == \"mistral\"\n    \n    @pytest.mark.asyncio\n    async def test_generate_completion(self, ollama_service):\n        \"\"\"Test generating a completion.\"\"\"\n        # Mock the response\n        mock_response = AsyncMock()\n        mock_response.status = 200\n        mock_response.json = AsyncMock(return_value={\n            \"id\": \"test-id\",\n            \"response\": \"This is a test response\",\n            \"created_at\": 1677858242\n        })\n        \n        # Mock the context manager\n        ollama_service.session.post = AsyncMock()\n        ollama_service.session.post.return_value.__aenter__.return_value = mock_response\n        \n        # Test the completion generation\n        response = await ollama_service._generate_completion_sync({\n            \"model\": \"llama2\",\n            \"prompt\": \"Hello, world!\",\n            \"stream\": False,\n            \"options\": {\"temperature\": 0.7}\n        })\n        \n        # Check the formatted response\n        assert \"message\" in response\n        assert response[\"message\"][\"content\"] == \"This is a test response\"\n        assert response[\"provider\"] == \"ollama\"\n    \n    @pytest.mark.asyncio\n    async def test_format_messages_for_ollama(self, ollama_service):\n        \"\"\"Test formatting messages for Ollama.\"\"\"\n        messages = [\n            {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n            {\"role\": \"user\", \"content\": \"Hello!\"},\n            {\"role\": \"assistant\", \"content\": \"Hi there!\"},\n            {\"role\": \"user\", \"content\": \"How are you?\"}\n        ]\n        \n        formatted = ollama_service._format_messages_for_ollama(messages)\n        \n        assert \"[System]\" in formatted\n        assert \"[User]\" in formatted\n        assert \"[Assistant]\" in formatted\n        assert \"You are a helpful assistant.\" in formatted\n        assert \"Hello!\" in formatted\n        assert \"How are you?\" in formatted\n    \n    @pytest.mark.asyncio\n    async def test_tool_call_extraction(self, ollama_service):\n        \"\"\"Test extracting tool calls from response text.\"\"\"\n        # Response with a tool call\n        response_with_tool = \"\"\"\n        I'll help you get the weather information.\n        \n        \u003ctool\u003e\n        {\n          \"name\": \"get_weather\",\n          \"parameters\": {\n            \"location\": \"New York\",\n            \"unit\": \"celsius\"\n          }\n        }\n        \u003c/tool\u003e\n        \n        Let me check the weather for you.\n        \"\"\"\n        \n        tool_calls = ollama_service._extract_tool_calls(response_with_tool)\n        \n        assert tool_calls is not None\n        assert len(tool_calls) == 1\n        assert tool_calls[0][\"function\"][\"name\"] == \"get_weather\"\n        assert \"New York\" in tool_calls[0][\"function\"][\"arguments\"]\n        \n        # Response without a tool call\n        response_without_tool = \"The weather in New York is sunny.\"\n        assert ollama_service._extract_tool_calls(response_without_tool) is None\n    \n    @pytest.mark.asyncio\n    async def test_clean_tool_calls_from_text(self, ollama_service):\n        \"\"\"Test cleaning tool calls from response text.\"\"\"\n        response_with_tool = \"\"\"\n        I'll help you get the weather information.\n        \n        \u003ctool\u003e\n        {\n          \"name\": \"get_weather\",\n          \"parameters\": {\n            \"location\": \"New York\",\n            \"unit\": \"celsius\"\n          }\n        }\n        \u003c/tool\u003e\n        \n        Let me check the weather for you.\n        \"\"\"\n        \n        cleaned = ollama_service._clean_tool_calls_from_text(response_with_tool)\n        \n        assert \"\u003ctool\u003e\" not in cleaned\n        assert \"get_weather\" not in cleaned\n        assert \"I'll help you get the weather information.\" in cleaned\n        assert \"Let me check the weather for you.\" in cleaned\n"])</script><script>self.__next_f.push([1,"c2:[\"$\",\"pre\",\"pre-41\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e6\"}]}]\nc3:[\"$\",\"h4\",\"h4-5\",{\"id\":\"tool-integration-testing\",\"children\":\"Tool Integration Testing\"}]\n2e7:T19d0,"])</script><script>self.__next_f.push([1,"# tests/unit/test_tool_integration.py\nimport pytest\nfrom unittest.mock import AsyncMock, patch\nimport json\n\nfrom app.agents.task_agent import TaskManagementAgent\nfrom app.models.message import Message, MessageRole\n\nclass TestToolIntegration:\n    @pytest.fixture\n    def task_agent(self):\n        \"\"\"Create a task agent with mocked services.\"\"\"\n        provider_service = AsyncMock()\n        task_service = AsyncMock()\n        \n        agent = TaskManagementAgent(\n            provider_service=provider_service,\n            task_service=task_service,\n            system_prompt=\"You are a task management agent.\"\n        )\n        \n        return agent\n    \n    @pytest.mark.asyncio\n    async def test_process_tool_calls_list_tasks(self, task_agent):\n        \"\"\"Test processing the list_tasks tool call.\"\"\"\n        # Mock task service response\n        task_agent.task_service.list_tasks.return_value = [\n            {\n                \"id\": \"task1\",\n                \"title\": \"Complete report\",\n                \"status\": \"pending\",\n                \"priority\": \"high\",\n                \"due_date\": \"2023-04-15\",\n                \"description\": \"Finish quarterly report\"\n            }\n        ]\n        \n        # Create a tool call for list_tasks\n        tool_calls = [{\n            \"id\": \"call_123\",\n            \"function\": {\n                \"name\": \"list_tasks\",\n                \"arguments\": json.dumps({\n                    \"status\": \"pending\",\n                    \"limit\": 5\n                })\n            }\n        }]\n        \n        # Process the tool calls\n        tool_responses = await task_agent._process_tool_calls(tool_calls, \"user123\")\n        \n        # Verify the response\n        assert len(tool_responses) == 1\n        assert tool_responses[0][\"tool_call_id\"] == \"call_123\"\n        assert \"Complete report\" in tool_responses[0][\"content\"]\n        assert \"pending\" in tool_responses[0][\"content\"]\n        \n        # Verify service was called correctly\n        task_agent.task_service.list_tasks.assert_called_once_with(\n            user_id=\"user123\",\n            status=\"pending\",\n            limit=5\n        )\n    \n    @pytest.mark.asyncio\n    async def test_process_tool_calls_create_task(self, task_agent):\n        \"\"\"Test processing the create_task tool call.\"\"\"\n        # Mock task service response\n        task_agent.task_service.create_task.return_value = {\n            \"id\": \"new_task\",\n            \"title\": \"New test task\"\n        }\n        \n        # Create a tool call for create_task\n        tool_calls = [{\n            \"id\": \"call_456\",\n            \"function\": {\n                \"name\": \"create_task\",\n                \"arguments\": json.dumps({\n                    \"title\": \"New test task\",\n                    \"description\": \"This is a test task\",\n                    \"priority\": \"medium\"\n                })\n            }\n        }]\n        \n        # Process the tool calls\n        tool_responses = await task_agent._process_tool_calls(tool_calls, \"user123\")\n        \n        # Verify the response\n        assert len(tool_responses) == 1\n        assert tool_responses[0][\"tool_call_id\"] == \"call_456\"\n        assert \"Task created successfully\" in tool_responses[0][\"content\"]\n        assert \"New test task\" in tool_responses[0][\"content\"]\n        \n        # Verify service was called correctly\n        task_agent.task_service.create_task.assert_called_once_with(\n            user_id=\"user123\",\n            title=\"New test task\",\n            description=\"This is a test task\",\n            due_date=None,\n            priority=\"medium\"\n        )\n    \n    @pytest.mark.asyncio\n    async def test_generate_response_with_tools(self, task_agent):\n        \"\"\"Test the full generate_response flow with tool usage.\"\"\"\n        # Set up the conversation history\n        task_agent.state.conversation_history = [\n            Message(role=MessageRole.SYSTEM, content=\"You are a task management agent.\"),\n            Message(role=MessageRole.USER, content=\"List my pending tasks\")\n        ]\n        \n        # Mock provider service to return a response with tool calls first\n        mock_response_with_tools = {\n            \"message\": {\n                \"content\": \"I'll list your tasks\",\n                \"tool_calls\": [{\n                    \"id\": \"call_123\",\n                    \"function\": {\n                        \"name\": \"list_tasks\",\n                        \"arguments\": json.dumps({\n                            \"status\": \"pending\",\n                            \"limit\": 10\n                        })\n                    }\n                }]\n            },\n            \"tool_calls\": [{\n                \"id\": \"call_123\",\n                \"function\": {\n                    \"name\": \"list_tasks\",\n                    \"arguments\": json.dumps({\n                        \"status\": \"pending\",\n                        \"limit\": 10\n                    })\n                }\n            }]\n        }\n        \n        # Mock task service\n        task_agent.task_service.list_tasks.return_value = [\n            {\n                \"id\": \"task1\",\n                \"title\": \"Complete report\",\n                \"status\": \"pending\",\n                \"priority\": \"high\",\n                \"due_date\": \"2023-04-15\",\n                \"description\": \"Finish quarterly report\"\n            }\n        ]\n        \n        # Mock final response after tool processing\n        mock_final_response = {\n            \"message\": {\n                \"content\": \"You have 1 pending task: Complete report (high priority, due Apr 15)\"\n            }\n        }\n        \n        # Set up the mocked provider service\n        task_agent.provider_service.generate_completion = AsyncMock()\n        task_agent.provider_service.generate_completion.side_effect = [\n            mock_response_with_tools,  # First call returns tool calls\n            mock_final_response        # Second call returns final response\n        ]\n        \n        # Generate the response\n        response = await task_agent._generate_response(\"user123\")\n        \n        # Verify the final response\n        assert response == \"You have 1 pending task: Complete report (high priority, due Apr 15)\"\n        \n        # Verify the provider service was called twice\n        assert task_agent.provider_service.generate_completion.call_count == 2\n        \n        # Verify the task service was called\n        task_agent.task_service.list_tasks.assert_called_once()\n        \n        # Verify tool response was added to conversation history\n        tool_messages = [msg for msg in task_agent.state.conversation_history if msg.role == MessageRole.TOOL]\n        assert len(tool_messages) == 1\n"])</script><script>self.__next_f.push([1,"c4:[\"$\",\"pre\",\"pre-42\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e7\"}]}]\nc5:[\"$\",\"h3\",\"h3-32\",{\"id\":\"2-integration-testing-framework\",\"children\":\"2. Integration Testing Framework\"}]\nc6:[\"$\",\"h4\",\"h4-6\",{\"id\":\"api-endpoint-testing\",\"children\":\"API Endpoint Testing\"}]\n2e8:T1075,"])</script><script>self.__next_f.push([1,"# tests/integration/test_api_endpoints.py\nimport pytest\nfrom fastapi.testclient import TestClient\nimport json\nimport os\nfrom unittest.mock import patch, AsyncMock\n\nfrom app.main import app\nfrom app.services.provider_service import ProviderService\n\nclient = TestClient(app)\n\nclass TestAPIEndpoints:\n    @pytest.fixture(autouse=True)\n    def setup_mocks(self):\n        \"\"\"Set up mocks for services.\"\"\"\n        # Patch the provider service\n        with patch('app.controllers.agent_controller.get_agent_factory') as mock_factory:\n            mock_provider = AsyncMock(spec=ProviderService)\n            mock_factory.return_value.provider_service = mock_provider\n            yield\n    \n    def test_health_endpoint(self):\n        \"\"\"Test the health check endpoint.\"\"\"\n        response = client.get(\"/api/health\")\n        assert response.status_code == 200\n        assert response.json()[\"status\"] == \"ok\"\n    \n    def test_chat_endpoint_auth_required(self):\n        \"\"\"Test that chat endpoint requires authentication.\"\"\"\n        response = client.post(\n            \"/api/v1/chat\",\n            json={\"message\": \"Hello\"}\n        )\n        assert response.status_code == 401  # Unauthorized\n    \n    def test_chat_endpoint_with_auth(self):\n        \"\"\"Test the chat endpoint with proper authentication.\"\"\"\n        # Mock the authentication\n        with patch('app.services.auth_service.get_current_user') as mock_auth:\n            mock_auth.return_value = {\"id\": \"test_user\"}\n            \n            # Mock the agent's process_message\n            with patch('app.agents.base_agent.BaseAgent.process_message') as mock_process:\n                mock_process.return_value = \"Hello, I'm an AI assistant.\"\n                \n                response = client.post(\n                    \"/api/v1/chat\",\n                    json={\"message\": \"Hi there\"},\n                    headers={\"Authorization\": \"Bearer test_token\"}\n                )\n                \n                assert response.status_code == 200\n                assert \"response\" in response.json()\n                assert response.json()[\"response\"] == \"Hello, I'm an AI assistant.\"\n    \n    def test_model_recommendation_endpoint(self):\n        \"\"\"Test the model recommendation endpoint.\"\"\"\n        # Mock the authentication\n        with patch('app.services.auth_service.get_current_user') as mock_auth:\n            mock_auth.return_value = {\"id\": \"test_user\"}\n            \n            response = client.get(\n                \"/api/v1/agents/models/recommend?use_case=code_generation\u0026performance_tier=high\",\n                headers={\"Authorization\": \"Bearer test_token\"}\n            )\n            \n            assert response.status_code == 200\n            data = response.json()\n            assert \"ollama_recommendation\" in data\n            assert data[\"use_case\"] == \"code_generation\"\n            assert data[\"performance_tier\"] == \"high\"\n    \n    def test_streaming_endpoint(self):\n        \"\"\"Test the streaming endpoint.\"\"\"\n        # Mock the authentication\n        with patch('app.services.auth_service.get_current_user') as mock_auth:\n            mock_auth.return_value = {\"id\": \"test_user\"}\n            \n            # Mock the streaming generator\n            async def mock_stream_generator():\n                yield {\"id\": \"1\", \"content\": \"Hello\"}\n                yield {\"id\": \"2\", \"content\": \" World\"}\n            \n            # Mock the stream method\n            with patch('app.services.provider_service.ProviderService.stream_completion') as mock_stream:\n                mock_stream.return_value = mock_stream_generator()\n                \n                response = client.post(\n                    \"/api/v1/chat/streaming\",\n                    json={\"message\": \"Hi\", \"stream\": True},\n                    headers={\"Authorization\": \"Bearer test_token\"}\n                )\n                \n                assert response.status_code == 200\n                assert response.headers[\"content-type\"] == \"text/event-stream\"\n                \n                # Parse the streaming response\n                content = response.content.decode()\n                assert \"data:\" in content\n                assert \"Hello\" in content\n                assert \"World\" in content\n"])</script><script>self.__next_f.push([1,"c7:[\"$\",\"pre\",\"pre-43\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e8\"}]}]\nc8:[\"$\",\"h4\",\"h4-7\",{\"id\":\"end-to-end-agent-flow-testing\",\"children\":\"End-to-End Agent Flow Testing\"}]\n2e9:T26c7,"])</script><script>self.__next_f.push([1,"# tests/integration/test_agent_flows.py\nimport pytest\nimport asyncio\nfrom unittest.mock import AsyncMock, patch\nimport json\n\nfrom app.agents.meta_agent import MetaAgent, AgentSubsystem\nfrom app.agents.research_agent import ResearchAgent\nfrom app.agents.conversation_manager import ConversationManager\nfrom app.models.message import Message, MessageRole\n\nclass TestAgentFlows:\n    @pytest.fixture\n    async def meta_agent_setup(self):\n        \"\"\"Set up a meta agent with subsystems for testing.\"\"\"\n        # Create mocked services\n        provider_service = AsyncMock()\n        knowledge_service = AsyncMock()\n        memory_service = AsyncMock()\n        \n        # Create subsystem agents\n        research_agent = ResearchAgent(\n            provider_service=provider_service,\n            knowledge_service=knowledge_service,\n            system_prompt=\"You are a research agent.\"\n        )\n        \n        conversation_agent = ConversationManager(\n            provider_service=provider_service,\n            system_prompt=\"You are a conversation management agent.\"\n        )\n        \n        # Create meta agent\n        meta_agent = MetaAgent(\n            provider_service=provider_service,\n            system_prompt=\"You are a meta agent that coordinates specialized agents.\"\n        )\n        \n        # Add subsystems\n        meta_agent.add_subsystem(AgentSubsystem(\n            name=\"research\",\n            agent=research_agent,\n            role=\"Knowledge retrieval specialist\"\n        ))\n        \n        meta_agent.add_subsystem(AgentSubsystem(\n            name=\"conversation\",\n            agent=conversation_agent,\n            role=\"Conversation flow manager\"\n        ))\n        \n        # Return the setup\n        return {\n            \"meta_agent\": meta_agent,\n            \"provider_service\": provider_service,\n            \"knowledge_service\": knowledge_service,\n            \"research_agent\": research_agent,\n            \"conversation_agent\": conversation_agent\n        }\n    \n    @pytest.mark.asyncio\n    async def test_meta_agent_routing(self, meta_agent_setup):\n        \"\"\"Test the meta agent's routing logic.\"\"\"\n        meta_agent = meta_agent_setup[\"meta_agent\"]\n        provider_service = meta_agent_setup[\"provider_service\"]\n        \n        # Setup conversation history\n        meta_agent.state.conversation_history = [\n            Message(role=MessageRole.SYSTEM, content=\"You are a meta agent.\"),\n            Message(role=MessageRole.USER, content=\"Tell me about quantum computing\")\n        ]\n        \n        # Mock the routing response to use research subsystem\n        routing_response = {\n            \"message\": {\n                \"content\": \"I'll route this to the research subsystem\"\n            },\n            \"tool_calls\": [{\n                \"id\": \"call_123\",\n                \"function\": {\n                    \"name\": \"route_to_subsystem\",\n                    \"arguments\": json.dumps({\n                        \"subsystem\": \"research\",\n                        \"task\": \"Tell me about quantum computing\",\n                        \"context\": {}\n                    })\n                }\n            }]\n        }\n        \n        # Mock the research agent's response\n        research_response = \"Quantum computing is a type of computing that uses quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data.\"\n        meta_agent_setup[\"research_agent\"].process_message = AsyncMock(return_value=research_response)\n        \n        # Mock the provider service responses\n        provider_service.generate_completion.side_effect = [\n            routing_response,  # First call for routing decision\n        ]\n        \n        # Generate response\n        response = await meta_agent._generate_response(\"user123\")\n        \n        # Verify routing happened correctly\n        assert \"[research\" in response\n        assert \"Quantum computing\" in response\n        \n        # Verify the research agent was called\n        meta_agent_setup[\"research_agent\"].process_message.assert_called_once_with(\n            \"Tell me about quantum computing\", \"user123\"\n        )\n    \n    @pytest.mark.asyncio\n    async def test_meta_agent_parallel_processing(self, meta_agent_setup):\n        \"\"\"Test the meta agent's parallel processing logic.\"\"\"\n        meta_agent = meta_agent_setup[\"meta_agent\"]\n        provider_service = meta_agent_setup[\"provider_service\"]\n        \n        # Setup conversation history\n        meta_agent.state.conversation_history = [\n            Message(role=MessageRole.SYSTEM, content=\"You are a meta agent.\"),\n            Message(role=MessageRole.USER, content=\"Explain the impacts of AI on society\")\n        ]\n        \n        # Mock the routing response to use parallel processing\n        routing_response = {\n            \"message\": {\n                \"content\": \"I'll process this with multiple subsystems\"\n            },\n            \"tool_calls\": [{\n                \"id\": \"call_456\",\n                \"function\": {\n                    \"name\": \"parallel_processing\",\n                    \"arguments\": json.dumps({\n                        \"task\": \"Explain the impacts of AI on society\",\n                        \"subsystems\": [\"research\", \"conversation\"]\n                    })\n                }\n            }]\n        }\n        \n        # Mock each agent's response\n        research_response = \"From a research perspective, AI impacts society through automation, economic transformation, and ethical considerations.\"\n        conversation_response = \"From a conversational perspective, AI is changing how we interact with technology and each other.\"\n        \n        meta_agent_setup[\"research_agent\"].process_message = AsyncMock(return_value=research_response)\n        meta_agent_setup[\"conversation_agent\"].process_message = AsyncMock(return_value=conversation_response)\n        \n        # Mock synthesis response\n        synthesis_response = {\n            \"message\": {\n                \"content\": \"AI has multifaceted impacts on society. From a research perspective, it drives automation and economic transformation. From a conversational perspective, it changes human-technology interaction patterns.\"\n            }\n        }\n        \n        # Mock the provider service responses\n        provider_service.generate_completion.side_effect = [\n            routing_response,    # First call for routing decision\n            synthesis_response   # Second call for synthesis\n        ]\n        \n        # Generate response\n        response = await meta_agent._generate_response(\"user123\")\n        \n        # Verify synthesis happened correctly\n        assert \"multifaceted impacts\" in response\n        assert provider_service.generate_completion.call_count == 2\n        \n        # Verify both agents were called\n        meta_agent_setup[\"research_agent\"].process_message.assert_called_once()\n        meta_agent_setup[\"conversation_agent\"].process_message.assert_called_once()\n    \n    @pytest.mark.asyncio\n    async def test_research_agent_knowledge_retrieval(self, meta_agent_setup):\n        \"\"\"Test the research agent's knowledge retrieval capabilities.\"\"\"\n        research_agent = meta_agent_setup[\"research_agent\"]\n        provider_service = meta_agent_setup[\"provider_service\"]\n        knowledge_service = meta_agent_setup[\"knowledge_service\"]\n        \n        # Setup conversation history\n        research_agent.state.conversation_history = [\n            Message(role=MessageRole.SYSTEM, content=\"You are a research agent.\"),\n            Message(role=MessageRole.USER, content=\"What are the latest developments in fusion energy?\")\n        ]\n        \n        # Mock knowledge retrieval results\n        knowledge_service.search.return_value = [\n            {\n                \"id\": \"doc1\",\n                \"title\": \"Recent Fusion Breakthrough\",\n                \"content\": \"Scientists achieved net energy gain in fusion reaction at NIF in December 2022.\",\n                \"relevance_score\": 0.95\n            },\n            {\n                \"id\": \"doc2\",\n                \"title\": \"Commercial Fusion Startups\",\n                \"content\": \"Several startups including Commonwealth Fusion Systems are working on commercial fusion reactors.\",\n                \"relevance_score\": 0.89\n            }\n        ]\n        \n        # Mock initial response with tool calls\n        tool_call_response = {\n            \"message\": {\n                \"content\": \"Let me search for information on fusion energy.\"\n            },\n            \"tool_calls\": [{\n                \"id\": \"call_789\",\n                \"function\": {\n                    \"name\": \"search_knowledge_base\",\n                    \"arguments\": json.dumps({\n                        \"query\": \"latest developments fusion energy\",\n                        \"max_results\": 3\n                    })\n                }\n            }]\n        }\n        \n        # Mock final response with knowledge incorporated\n        final_response = {\n            \"message\": {\n                \"content\": \"Recent developments in fusion energy include a breakthrough at NIF in December 2022 achieving net energy gain, and advances from startups like Commonwealth Fusion Systems working on commercial reactors.\"\n            }\n        }\n        \n        # Mock the provider service responses\n        provider_service.generate_completion.side_effect = [\n            tool_call_response,  # First call with tool request\n            final_response       # Second call with knowledge incorporated\n        ]\n        \n        # Generate response\n        response = await research_agent._generate_response(\"user123\")\n        \n        # Verify response includes knowledge\n        assert \"NIF\" in response\n        assert \"Commonwealth Fusion Systems\" in response\n        \n        # Verify knowledge service was called\n        knowledge_service.search.assert_called_once_with(\n            query=\"latest developments fusion energy\",\n            max_results=3\n        )\n"])</script><script>self.__next_f.push([1,"c9:[\"$\",\"pre\",\"pre-44\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2e9\"}]}]\nca:[\"$\",\"h4\",\"h4-8\",{\"id\":\"cross-provider-integration-testing\",\"children\":\"Cross-Provider Integration Testing\"}]\n2ea:T1302,"])</script><script>self.__next_f.push([1,"# tests/integration/test_cross_provider.py\nimport pytest\nimport os\nfrom unittest.mock import patch, AsyncMock\nimport json\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\nclass TestCrossProviderIntegration:\n    @pytest.fixture\n    async def real_services(self):\n        \"\"\"Set up real services for integration testing.\"\"\"\n        # Skip tests if API keys aren't available in the environment\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize real services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        # Initialize the services\n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    @pytest.mark.asyncio\n    async def test_provider_selection_complex_query(self, real_services):\n        \"\"\"Test that complex queries route to OpenAI.\"\"\"\n        provider_service = real_services[\"provider_service\"]\n        \n        # Adjust complexity threshold to ensure predictable routing\n        provider_service.model_selection_criteria.complexity_threshold = 0.5\n        \n        # Complex query that should route to OpenAI\n        complex_messages = [\n            {\"role\": \"user\", \"content\": \"Provide a detailed analysis of the philosophical implications of artificial general intelligence, considering perspectives from epistemology, ethics, and metaphysics.\"}\n        ]\n        \n        # Select provider\n        provider, model = await provider_service._select_provider_and_model(\n            messages=complex_messages,\n            provider=\"auto\"\n        )\n        \n        # Verify routing decision\n        assert provider == Provider.OPENAI\n    \n    @pytest.mark.asyncio\n    async def test_provider_selection_simple_query(self, real_services):\n        \"\"\"Test that simple queries route to Ollama.\"\"\"\n        provider_service = real_services[\"provider_service\"]\n        \n        # Adjust complexity threshold to ensure predictable routing\n        provider_service.model_selection_criteria.complexity_threshold = 0.5\n        \n        # Simple query that should route to Ollama\n        simple_messages = [\n            {\"role\": \"user\", \"content\": \"What's the weather like today?\"}\n        ]\n        \n        # Select provider\n        provider, model = await provider_service._select_provider_and_model(\n            messages=simple_messages,\n            provider=\"auto\"\n        )\n        \n        # Verify routing decision\n        assert provider == Provider.OLLAMA\n    \n    @pytest.mark.asyncio\n    async def test_fallback_mechanism_real(self, real_services):\n        \"\"\"Test the fallback mechanism with real services.\"\"\"\n        provider_service = real_services[\"provider_service\"]\n        \n        # Intentionally cause OpenAI to fail by using an invalid model\n        messages = [\n            {\"role\": \"user\", \"content\": \"Simple test message\"}\n        ]\n        \n        try:\n            # This should fail with OpenAI but succeed with Ollama fallback\n            response = await provider_service.generate_completion(\n                messages=messages,\n                model=\"openai:non-existent-model\",  # Invalid model\n                provider=\"auto\"  # Enable auto-fallback\n            )\n            \n            # If we get here, fallback worked\n            assert response[\"provider\"] == \"ollama\"\n            assert \"content\" in response[\"message\"]\n        except Exception as e:\n            pytest.fail(f\"Fallback mechanism failed: {str(e)}\")\n    \n    @pytest.mark.asyncio\n    async def test_ollama_response_format(self, real_services):\n        \"\"\"Test that Ollama responses are properly formatted to match OpenAI's structure.\"\"\"\n        ollama_service = real_services[\"ollama_service\"]\n        \n        # Generate a basic response\n        messages = [\n            {\"role\": \"user\", \"content\": \"What is 2+2?\"}\n        ]\n        \n        response = await ollama_service.generate_completion(\n            messages=messages,\n            model=\"llama2\"  # Specify a model that should exist\n        )\n        \n        # Verify response structure matches expected format\n        assert \"id\" in response\n        assert \"object\" in response\n        assert \"model\" in response\n        assert \"usage\" in response\n        assert \"message\" in response\n        assert \"content\" in response[\"message\"]\n        assert response[\"provider\"] == \"ollama\"\n"])</script><script>self.__next_f.push([1,"cb:[\"$\",\"pre\",\"pre-45\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2ea\"}]}]\ncc:[\"$\",\"h3\",\"h3-33\",{\"id\":\"3-performance-testing-framework\",\"children\":\"3. Performance Testing Framework\"}]\ncd:[\"$\",\"h4\",\"h4-9\",{\"id\":\"response-latency-benchmarking\",\"children\":\"Response Latency Benchmarking\"}]\n2eb:T1647,"])</script><script>self.__next_f.push([1,"# tests/performance/test_latency.py\nimport pytest\nimport time\nimport asyncio\nimport statistics\nfrom typing import List, Dict, Any\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport os\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\n# Skip tests if it's CI environment\nSKIP_PERFORMANCE_TESTS = os.environ.get(\"CI\") == \"true\"\n\n@pytest.mark.skipif(SKIP_PERFORMANCE_TESTS, reason=\"Performance tests skipped in CI environment\")\nclass TestResponseLatency:\n    @pytest.fixture\n    async def services(self):\n        \"\"\"Set up services for latency testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    async def measure_latency(self, provider_service, provider, model, messages):\n        \"\"\"Measure response latency for a given provider and model.\"\"\"\n        start_time = time.time()\n        \n        if provider == \"openai\":\n            await provider_service._generate_openai_completion(\n                messages=messages,\n                model=model\n            )\n        else:  # ollama\n            await provider_service._generate_ollama_completion(\n                messages=messages,\n                model=model\n            )\n            \n        end_time = time.time()\n        return end_time - start_time\n    \n    @pytest.mark.asyncio\n    async def test_latency_comparison(self, services):\n        \"\"\"Compare latency between OpenAI and Ollama for different query types.\"\"\"\n        provider_service = services[\"provider_service\"]\n        \n        # Test messages of different complexity\n        test_messages = [\n            {\n                \"name\": \"simple_factual\",\n                \"messages\": [{\"role\": \"user\", \"content\": \"What is the capital of France?\"}]\n            },\n            {\n                \"name\": \"medium_explanation\",\n                \"messages\": [{\"role\": \"user\", \"content\": \"Explain how photosynthesis works in plants.\"}]\n            },\n            {\n                \"name\": \"complex_analysis\",\n                \"messages\": [{\"role\": \"user\", \"content\": \"Analyze the economic factors that contributed to the 2008 financial crisis and their long-term impacts.\"}]\n            }\n        ]\n        \n        # Models to test\n        models = {\n            \"openai\": [\"gpt-3.5-turbo\", \"gpt-4\"],\n            \"ollama\": [\"llama2\", \"mistral\"]\n        }\n        \n        # Number of repetitions for each test\n        repetitions = 3\n        \n        # Collect results\n        results = []\n        \n        for message_type in test_messages:\n            for provider in models:\n                for model in models[provider]:\n                    for i in range(repetitions):\n                        try:\n                            latency = await self.measure_latency(\n                                provider_service, \n                                provider, \n                                model, \n                                message_type[\"messages\"]\n                            )\n                            \n                            results.append({\n                                \"provider\": provider,\n                                \"model\": model,\n                                \"message_type\": message_type[\"name\"],\n                                \"repetition\": i,\n                                \"latency\": latency\n                            })\n                            \n                            # Add a small delay to avoid rate limits\n                            await asyncio.sleep(1)\n                        except Exception as e:\n                            print(f\"Error testing {provider}:{model} - {str(e)}\")\n        \n        # Analyze results\n        df = pd.DataFrame(results)\n        \n        # Calculate average latency by provider, model, and message type\n        avg_latency = df.groupby(['provider', 'model', 'message_type'])['latency'].mean().reset_index()\n        \n        # Generate summary statistics\n        summary = avg_latency.pivot_table(\n            index=['provider', 'model'],\n            columns='message_type',\n            values='latency'\n        ).reset_index()\n        \n        # Print summary\n        print(\"\\nLatency Benchmark Results (seconds):\")\n        print(summary)\n        \n        # Create visualization\n        plt.figure(figsize=(12, 8))\n        \n        for message_type in test_messages:\n            subset = avg_latency[avg_latency['message_type'] == message_type['name']]\n            x = range(len(subset))\n            labels = [f\"{row['provider']}\\n{row['model']}\" for _, row in subset.iterrows()]\n            \n            plt.subplot(1, len(test_messages), test_messages.index(message_type) + 1)\n            plt.bar(x, subset['latency'])\n            plt.xticks(x, labels, rotation=45)\n            plt.title(f\"Latency: {message_type['name']}\")\n            plt.ylabel(\"Seconds\")\n        \n        plt.tight_layout()\n        plt.savefig('latency_benchmark.png')\n        \n        # Assert something meaningful\n        assert len(results) \u003e 0, \"No benchmark results collected\"\n"])</script><script>self.__next_f.push([1,"ce:[\"$\",\"pre\",\"pre-46\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2eb\"}]}]\ncf:[\"$\",\"h4\",\"h4-10\",{\"id\":\"memory-usage-monitoring\",\"children\":\"Memory Usage Monitoring\"}]\n2ec:T1951,"])</script><script>self.__next_f.push([1,"# tests/performance/test_memory_usage.py\nimport pytest\nimport os\nimport asyncio\nimport psutil\nimport time\nimport resource\nimport matplotlib.pyplot as plt\nimport pandas as pd\nfrom typing import List, Dict, Any\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\n# Skip tests if it's CI environment\nSKIP_PERFORMANCE_TESTS = os.environ.get(\"CI\") == \"true\"\n\n@pytest.mark.skipif(SKIP_PERFORMANCE_TESTS, reason=\"Performance tests skipped in CI environment\")\nclass TestMemoryUsage:\n    @pytest.fixture\n    async def services(self):\n        \"\"\"Set up services for memory testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    def get_memory_usage(self):\n        \"\"\"Get current memory usage of the process.\"\"\"\n        process = psutil.Process(os.getpid())\n        memory_info = process.memory_info()\n        return memory_info.rss / (1024 * 1024)  # Convert to MB\n    \n    async def monitor_memory_during_request(self, provider_service, provider, model, messages):\n        \"\"\"Monitor memory usage during a request.\"\"\"\n        memory_samples = []\n        \n        # Start memory monitoring thread\n        monitoring = True\n        \n        async def memory_monitor():\n            start_time = time.time()\n            while monitoring:\n                memory_samples.append({\n                    \"time\": time.time() - start_time,\n                    \"memory_mb\": self.get_memory_usage()\n                })\n                await asyncio.sleep(0.1)  # Sample every 100ms\n        \n        # Start monitoring\n        monitor_task = asyncio.create_task(memory_monitor())\n        \n        # Make the request\n        start_time = time.time()\n        try:\n            if provider == \"openai\":\n                await provider_service._generate_openai_completion(\n                    messages=messages,\n                    model=model\n                )\n            else:  # ollama\n                await provider_service._generate_ollama_completion(\n                    messages=messages,\n                    model=model\n                )\n        finally:\n            end_time = time.time()\n            \n            # Stop monitoring\n            monitoring = False\n            await monitor_task\n        \n        return {\n            \"samples\": memory_samples,\n            \"duration\": end_time - start_time,\n            \"peak_memory\": max(sample[\"memory_mb\"] for sample in memory_samples) if memory_samples else 0,\n            \"mean_memory\": sum(sample[\"memory_mb\"] for sample in memory_samples) / len(memory_samples) if memory_samples else 0\n        }\n    \n    @pytest.mark.asyncio\n    async def test_memory_usage_comparison(self, services):\n        \"\"\"Compare memory usage between OpenAI and Ollama.\"\"\"\n        provider_service = services[\"provider_service\"]\n        \n        # Test messages\n        test_message = {\"role\": \"user\", \"content\": \"Write a detailed essay about climate change and its global impact.\"}\n        \n        # Models to test\n        models = {\n            \"openai\": [\"gpt-3.5-turbo\"],\n            \"ollama\": [\"llama2\"]\n        }\n        \n        # Collect results\n        results = []\n        memory_data = {}\n        \n        for provider in models:\n            for model in models[provider]:\n                # Collect initial memory\n                initial_memory = self.get_memory_usage()\n                \n                # Monitor during request\n                memory_result = await self.monitor_memory_during_request(\n                    provider_service,\n                    provider,\n                    model,\n                    [test_message]\n                )\n                \n                # Store results\n                key = f\"{provider}:{model}\"\n                memory_data[key] = memory_result[\"samples\"]\n                \n                results.append({\n                    \"provider\": provider,\n                    \"model\": model,\n                    \"initial_memory_mb\": initial_memory,\n                    \"peak_memory_mb\": memory_result[\"peak_memory\"],\n                    \"mean_memory_mb\": memory_result[\"mean_memory\"],\n                    \"memory_increase_mb\": memory_result[\"peak_memory\"] - initial_memory,\n                    \"duration_seconds\": memory_result[\"duration\"]\n                })\n                \n                # Wait a bit to let memory stabilize\n                await asyncio.sleep(2)\n        \n        # Analyze results\n        df = pd.DataFrame(results)\n        \n        # Print summary\n        print(\"\\nMemory Usage Results:\")\n        print(df.to_string(index=False))\n        \n        # Create visualization\n        plt.figure(figsize=(15, 10))\n        \n        # Plot memory over time\n        plt.subplot(2, 1, 1)\n        for key, samples in memory_data.items():\n            times = [s[\"time\"] for s in samples]\n            memory = [s[\"memory_mb\"] for s in samples]\n            plt.plot(times, memory, label=key)\n        \n        plt.xlabel(\"Time (seconds)\")\n        plt.ylabel(\"Memory Usage (MB)\")\n        plt.title(\"Memory Usage Over Time During Request\")\n        plt.legend()\n        plt.grid(True)\n        \n        # Plot peak and increase\n        plt.subplot(2, 1, 2)\n        providers = df[\"provider\"].tolist()\n        models = df[\"model\"].tolist()\n        labels = [f\"{p}\\n{m}\" for p, m in zip(providers, models)]\n        x = range(len(labels))\n        \n        plt.bar(x, df[\"memory_increase_mb\"], label=\"Memory Increase\")\n        plt.xticks(x, labels)\n        plt.ylabel(\"Memory (MB)\")\n        plt.title(\"Memory Increase by Provider/Model\")\n        plt.legend()\n        plt.grid(True)\n        \n        plt.tight_layout()\n        plt.savefig('memory_benchmark.png')\n        \n        # Assert something meaningful\n        assert len(results) \u003e 0, \"No memory benchmark results collected\"\n"])</script><script>self.__next_f.push([1,"d0:[\"$\",\"pre\",\"pre-47\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2ec\"}]}]\nd1:[\"$\",\"h4\",\"h4-11\",{\"id\":\"response-quality-benchmarking\",\"children\":\"Response Quality Benchmarking\"}]\n2ed:T1f5f,"])</script><script>self.__next_f.push([1,"# tests/performance/test_response_quality.py\nimport pytest\nimport os\nimport asyncio\nimport json\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom typing import List, Dict, Any\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\n# Skip tests if it's CI environment\nSKIP_PERFORMANCE_TESTS = os.environ.get(\"CI\") == \"true\"\n\n@pytest.mark.skipif(SKIP_PERFORMANCE_TESTS, reason=\"Performance tests skipped in CI environment\")\nclass TestResponseQuality:\n    @pytest.fixture\n    async def services(self):\n        \"\"\"Set up services for quality testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    async def get_response(self, provider_service, provider, model, messages):\n        \"\"\"Get a response from a specific provider and model.\"\"\"\n        if provider == \"openai\":\n            response = await provider_service._generate_openai_completion(\n                messages=messages,\n                model=model\n            )\n        else:  # ollama\n            response = await provider_service._generate_ollama_completion(\n                messages=messages,\n                model=model\n            )\n            \n        return response[\"message\"][\"content\"]\n    \n    async def evaluate_response(self, provider_service, response, criteria):\n        \"\"\"Evaluate a response using GPT-4 as a judge.\"\"\"\n        evaluation_prompt = [\n            {\"role\": \"system\", \"content\": \"\"\"\n            You are an expert evaluator of AI responses. Evaluate the given response based on the specified criteria.\n            For each criterion, provide a score from 1-10 and a brief explanation.\n            Format your response as valid JSON with the following structure:\n            {\n                \"criteria\": {\n                    \"accuracy\": {\"score\": X, \"explanation\": \"...\"},\n                    \"completeness\": {\"score\": X, \"explanation\": \"...\"},\n                    \"coherence\": {\"score\": X, \"explanation\": \"...\"},\n                    \"relevance\": {\"score\": X, \"explanation\": \"...\"}\n                },\n                \"overall_score\": X,\n                \"summary\": \"...\"\n            }\n            \"\"\"},\n            {\"role\": \"user\", \"content\": f\"\"\"\n            Evaluate this AI response based on {', '.join(criteria)}:\n            \n            RESPONSE TO EVALUATE:\n            {response}\n            \"\"\"}\n        ]\n        \n        # Use GPT-4 to evaluate\n        evaluation = await provider_service._generate_openai_completion(\n            messages=evaluation_prompt,\n            model=\"gpt-4\",\n            response_format={\"type\": \"json_object\"}\n        )\n        \n        try:\n            return json.loads(evaluation[\"message\"][\"content\"])\n        except:\n            # Fallback if parsing fails\n            return {\n                \"criteria\": {c: {\"score\": 0, \"explanation\": \"Failed to parse\"} for c in criteria},\n                \"overall_score\": 0,\n                \"summary\": \"Failed to parse evaluation\"\n            }\n    \n    @pytest.mark.asyncio\n    async def test_response_quality_comparison(self, services):\n        \"\"\"Compare response quality between OpenAI and Ollama models.\"\"\"\n        provider_service = services[\"provider_service\"]\n        \n        # Test scenarios\n        test_scenarios = [\n            {\n                \"name\": \"factual_knowledge\",\n                \"query\": \"Explain the process of photosynthesis and its importance to life on Earth.\"\n            },\n            {\n                \"name\": \"reasoning\",\n                \"query\": \"A bat and ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?\"\n            },\n            {\n                \"name\": \"creative_writing\",\n                \"query\": \"Write a short story about a robot discovering emotions.\"\n            },\n            {\n                \"name\": \"code_generation\",\n                \"query\": \"Write a Python function to check if a string is a palindrome.\"\n            }\n        ]\n        \n        # Models to test\n        models = {\n            \"openai\": [\"gpt-3.5-turbo\"],\n            \"ollama\": [\"llama2\", \"mistral\"]\n        }\n        \n        # Evaluation criteria\n        criteria = [\"accuracy\", \"completeness\", \"coherence\", \"relevance\"]\n        \n        # Collect results\n        results = []\n        \n        for scenario in test_scenarios:\n            for provider in models:\n                for model in models[provider]:\n                    try:\n                        # Get response\n                        response = await self.get_response(\n                            provider_service,\n                            provider,\n                            model,\n                            [{\"role\": \"user\", \"content\": scenario[\"query\"]}]\n                        )\n                        \n                        # Evaluate response\n                        evaluation = await self.evaluate_response(\n                            provider_service,\n                            response,\n                            criteria\n                        )\n                        \n                        # Store results\n                        results.append({\n                            \"scenario\": scenario[\"name\"],\n                            \"provider\": provider,\n                            \"model\": model,\n                            \"overall_score\": evaluation[\"overall_score\"],\n                            **{f\"{criterion}_score\": evaluation[\"criteria\"][criterion][\"score\"] \n                              for criterion in criteria}\n                        })\n                        \n                        # Add raw responses for detailed analysis\n                        with open(f\"response_{provider}_{model}_{scenario['name']}.txt\", \"w\") as f:\n                            f.write(response)\n                        \n                        # Add a delay to avoid rate limits\n                        await asyncio.sleep(2)\n                    except Exception as e:\n                        print(f\"Error evaluating {provider}:{model} on {scenario['name']}: {str(e)}\")\n        \n        # Analyze results\n        df = pd.DataFrame(results)\n        \n        # Save results\n        df.to_csv(\"quality_benchmark_results.csv\", index=False)\n        \n        # Print summary\n        print(\"\\nResponse Quality Results:\")\n        summary = df.groupby(['provider', 'model']).mean().reset_index()\n        print(summary.to_string(index=False))\n        \n        # Create visualization\n        plt.figure(figsize=(15, 10))\n        \n        # Plot overall scores by scenario\n        plt.subplot(2, 1, 1)\n        for i, scenario in enumerate(test_scenarios):\n            scenario_df = df[df['scenario'] == scenario['name']]\n            providers = scenario_df[\"provider\"].tolist()\n            models = scenario_df[\"model\"].tolist()\n            labels = [f\"{p}\\n{m}\" for p, m in zip(providers, models)]\n            \n            plt.subplot(2, 2, i+1)\n            plt.bar(labels, scenario_df[\"overall_score\"])\n            plt.title(f\"Quality Scores: {scenario['name']}\")\n            plt.ylabel(\"Score (1-10)\")\n            plt.ylim(0, 10)\n            plt.xticks(rotation=45)\n        \n        plt.tight_layout()\n        plt.savefig('quality_benchmark.png')\n        \n        # Assert something meaningful\n        assert len(results) \u003e 0, \"No quality benchmark results collected\"\n"])</script><script>self.__next_f.push([1,"d2:[\"$\",\"pre\",\"pre-48\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2ed\"}]}]\nd3:[\"$\",\"h3\",\"h3-34\",{\"id\":\"4-reliability-testing-framework\",\"children\":\"4. Reliability Testing Framework\"}]\nd4:[\"$\",\"h4\",\"h4-12\",{\"id\":\"error-handling-and-fallback-testing\",\"children\":\"Error Handling and Fallback Testing\"}]\n2ee:T17e2,"])</script><script>self.__next_f.push([1,"# tests/reliability/test_error_handling.py\nimport pytest\nimport asyncio\nfrom unittest.mock import AsyncMock, patch, MagicMock\nimport aiohttp\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\nclass TestErrorHandling:\n    @pytest.fixture\n    def provider_service(self):\n        \"\"\"Create a provider service with mocked dependencies for testing.\"\"\"\n        service = ProviderService()\n        service.openai_client = AsyncMock()\n        service.ollama_service = AsyncMock(spec=OllamaService)\n        return service\n    \n    @pytest.mark.asyncio\n    async def test_openai_connection_error(self, provider_service):\n        \"\"\"Test handling of OpenAI connection errors.\"\"\"\n        # Mock OpenAI to raise a connection error\n        provider_service._generate_openai_completion = AsyncMock(\n            side_effect=aiohttp.ClientConnectionError(\"Connection refused\")\n        )\n        \n        # Mock Ollama to succeed\n        provider_service._generate_ollama_completion = AsyncMock(return_value={\n            \"id\": \"ollama-fallback\",\n            \"provider\": \"ollama\",\n            \"message\": {\"content\": \"Fallback response\"}\n        })\n        \n        # Test with auto routing\n        response = await provider_service.generate_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Test message\"}],\n            provider=\"auto\"\n        )\n        \n        # Verify fallback worked\n        assert response[\"provider\"] == \"ollama\"\n        assert response[\"message\"][\"content\"] == \"Fallback response\"\n        provider_service._generate_openai_completion.assert_called_once()\n        provider_service._generate_ollama_completion.assert_called_once()\n    \n    @pytest.mark.asyncio\n    async def test_ollama_connection_error(self, provider_service):\n        \"\"\"Test handling of Ollama connection errors.\"\"\"\n        # Mock the auto routing to select Ollama first\n        provider_service._auto_route = AsyncMock(return_value=Provider.OLLAMA)\n        \n        # Mock Ollama to fail\n        provider_service._generate_ollama_completion = AsyncMock(\n            side_effect=aiohttp.ClientConnectionError(\"Connection refused\")\n        )\n        \n        # Mock OpenAI to succeed\n        provider_service._generate_openai_completion = AsyncMock(return_value={\n            \"id\": \"openai-fallback\",\n            \"provider\": \"openai\",\n            \"message\": {\"content\": \"Fallback response\"}\n        })\n        \n        # Test with auto routing\n        response = await provider_service.generate_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Test message\"}],\n            provider=\"auto\"\n        )\n        \n        # Verify fallback worked\n        assert response[\"provider\"] == \"openai\"\n        assert response[\"message\"][\"content\"] == \"Fallback response\"\n        provider_service._generate_ollama_completion.assert_called_once()\n        provider_service._generate_openai_completion.assert_called_once()\n    \n    @pytest.mark.asyncio\n    async def test_rate_limit_handling(self, provider_service):\n        \"\"\"Test handling of rate limit errors.\"\"\"\n        # Mock OpenAI to raise a rate limit error\n        rate_limit_error = MagicMock()\n        rate_limit_error.status_code = 429\n        rate_limit_error.json.return_value = {\"error\": {\"message\": \"Rate limit exceeded\"}}\n        \n        provider_service._generate_openai_completion = AsyncMock(\n            side_effect=openai.RateLimitError(\"Rate limit exceeded\", response=rate_limit_error)\n        )\n        \n        # Mock Ollama to succeed\n        provider_service._generate_ollama_completion = AsyncMock(return_value={\n            \"id\": \"ollama-fallback\",\n            \"provider\": \"ollama\",\n            \"message\": {\"content\": \"Fallback response\"}\n        })\n        \n        # Test with auto routing\n        response = await provider_service.generate_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Test message\"}],\n            provider=\"auto\"\n        )\n        \n        # Verify fallback worked\n        assert response[\"provider\"] == \"ollama\"\n        assert response[\"message\"][\"content\"] == \"Fallback response\"\n    \n    @pytest.mark.asyncio\n    async def test_timeout_handling(self, provider_service):\n        \"\"\"Test handling of timeout errors.\"\"\"\n        # Mock OpenAI to raise a timeout error\n        provider_service._generate_openai_completion = AsyncMock(\n            side_effect=asyncio.TimeoutError(\"Request timed out\")\n        )\n        \n        # Mock Ollama to succeed\n        provider_service._generate_ollama_completion = AsyncMock(return_value={\n            \"id\": \"ollama-fallback\",\n            \"provider\": \"ollama\",\n            \"message\": {\"content\": \"Fallback response\"}\n        })\n        \n        # Test with auto routing\n        response = await provider_service.generate_completion(\n            messages=[{\"role\": \"user\", \"content\": \"Test message\"}],\n            provider=\"auto\"\n        )\n        \n        # Verify fallback worked\n        assert response[\"provider\"] == \"ollama\"\n        assert response[\"message\"][\"content\"] == \"Fallback response\"\n    \n    @pytest.mark.asyncio\n    async def test_all_providers_fail(self, provider_service):\n        \"\"\"Test case when all providers fail.\"\"\"\n        # Mock both providers to fail\n        provider_service._generate_openai_completion = AsyncMock(\n            side_effect=Exception(\"OpenAI failed\")\n        )\n        \n        provider_service._generate_ollama_completion = AsyncMock(\n            side_effect=Exception(\"Ollama failed\")\n        )\n        \n        # Test with auto routing - should raise an exception\n        with pytest.raises(Exception) as excinfo:\n            await provider_service.generate_completion(\n                messages=[{\"role\": \"user\", \"content\": \"Test message\"}],\n                provider=\"auto\"\n            )\n        \n        # Verify the original exception is re-raised\n        assert \"OpenAI failed\" in str(excinfo.value)\n        provider_service._generate_openai_completion.assert_called_once()\n        provider_service._generate_ollama_completion.assert_called_once()\n"])</script><script>self.__next_f.push([1,"d5:[\"$\",\"pre\",\"pre-49\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2ee\"}]}]\nd6:[\"$\",\"h4\",\"h4-13\",{\"id\":\"load-testing\",\"children\":\"Load Testing\"}]\n2ef:T13ff,"])</script><script>self.__next_f.push([1,"# tests/reliability/test_load.py\nimport pytest\nimport asyncio\nimport time\nimport os\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom aiohttp import ClientSession, TCPConnector\n\nfrom app.services.provider_service import ProviderService, Provider\n\n# Skip tests if it's CI environment\nSKIP_LOAD_TESTS = os.environ.get(\"CI\") == \"true\"\n\n@pytest.mark.skipif(SKIP_LOAD_TESTS, reason=\"Load tests skipped in CI environment\")\nclass TestLoadHandling:\n    @pytest.fixture\n    async def provider_service(self):\n        \"\"\"Set up provider service for load testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize service\n        service = ProviderService()\n        \n        try:\n            await service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize service: {str(e)}\")\n        \n        yield service\n        \n        # Cleanup\n        await service.cleanup()\n    \n    async def send_request(self, provider_service, provider, model, message, request_id):\n        \"\"\"Send a single request and record performance.\"\"\"\n        start_time = time.time()\n        success = False\n        error = None\n        \n        try:\n            response = await provider_service.generate_completion(\n                messages=[{\"role\": \"user\", \"content\": message}],\n                provider=provider,\n                model=model\n            )\n            success = True\n        except Exception as e:\n            error = str(e)\n        \n        end_time = time.time()\n        \n        return {\n            \"request_id\": request_id,\n            \"provider\": provider,\n            \"model\": model,\n            \"success\": success,\n            \"error\": error,\n            \"duration\": end_time - start_time\n        }\n    \n    @pytest.mark.asyncio\n    async def test_concurrent_requests(self, provider_service):\n        \"\"\"Test handling of multiple concurrent requests.\"\"\"\n        # Test configurations\n        providers = [\"openai\", \"ollama\", \"auto\"]\n        request_count = 10  # 10 requests per provider\n        \n        # Test message (simple to avoid rate limits)\n        message = \"What is 2+2?\"\n        \n        # Create tasks for all requests\n        tasks = []\n        request_id = 0\n        \n        for provider in providers:\n            for _ in range(request_count):\n                # Determine model based on provider\n                if provider == \"openai\":\n                    model = \"gpt-3.5-turbo\"\n                elif provider == \"ollama\":\n                    model = \"llama2\"\n                else:\n                    model = None  # Auto select\n                \n                tasks.append(self.send_request(\n                    provider_service,\n                    provider,\n                    model,\n                    message,\n                    request_id\n                ))\n                request_id += 1\n                \n                # Small delay to avoid immediate rate limiting\n                await asyncio.sleep(0.1)\n        \n        # Run requests concurrently with a reasonable concurrency limit\n        concurrency_limit = 5\n        results = []\n        \n        for i in range(0, len(tasks), concurrency_limit):\n            batch = tasks[i:i+concurrency_limit]\n            batch_results = await asyncio.gather(*batch)\n            results.extend(batch_results)\n            \n            # Delay between batches to avoid rate limits\n            await asyncio.sleep(2)\n        \n        # Analyze results\n        df = pd.DataFrame(results)\n        \n        # Print summary\n        print(\"\\nConcurrent Request Test Results:\")\n        success_rate = df.groupby('provider')['success'].mean() * 100\n        mean_duration = df.groupby('provider')['duration'].mean()\n        \n        summary = pd.DataFrame({\n            'success_rate': success_rate,\n            'mean_duration': mean_duration\n        }).reset_index()\n        \n        print(summary.to_string(index=False))\n        \n        # Create visualization\n        plt.figure(figsize=(12, 10))\n        \n        # Plot success rate\n        plt.subplot(2, 1, 1)\n        plt.bar(summary['provider'], summary['success_rate'])\n        plt.title('Success Rate by Provider')\n        plt.ylabel('Success Rate (%)')\n        plt.ylim(0, 100)\n        \n        # Plot response times\n        plt.subplot(2, 1, 2)\n        for provider in providers:\n            provider_df = df[df['provider'] == provider]\n            plt.plot(provider_df['request_id'], provider_df['duration'], marker='o', label=provider)\n        \n        plt.title('Response Time by Request')\n        plt.xlabel('Request ID')\n        plt.ylabel('Duration (seconds)')\n        plt.legend()\n        plt.grid(True)\n        \n        plt.tight_layout()\n        plt.savefig('load_test_results.png')\n        \n        # Assert reasonable success rate\n        for provider in providers:\n            provider_success = df[df['provider'] == provider]['success'].mean() * 100\n            assert provider_success \u003e= 70, f\"Success rate for {provider} is below 70%\"\n"])</script><script>self.__next_f.push([1,"d7:[\"$\",\"pre\",\"pre-50\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2ef\"}]}]\nd8:[\"$\",\"h4\",\"h4-14\",{\"id\":\"stability-testing-for-extended-sessions\",\"children\":\"Stability Testing for Extended Sessions\"}]\n2f0:T1b0b,"])</script><script>self.__next_f.push([1,"# tests/reliability/test_stability.py\nimport pytest\nimport asyncio\nimport time\nimport os\nimport random\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom typing import List, Dict, Any\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.agents.base_agent import BaseAgent, AgentState\nfrom app.agents.research_agent import ResearchAgent\nfrom app.models.message import Message, MessageRole\n\n# Skip tests if it's CI environment\nSKIP_STABILITY_TESTS = os.environ.get(\"CI\") == \"true\"\n\n@pytest.mark.skipif(SKIP_STABILITY_TESTS, reason=\"Stability tests skipped in CI environment\")\nclass TestSystemStability:\n    @pytest.fixture\n    async def setup(self):\n        \"\"\"Set up test environment with services and agents.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize service\n        provider_service = ProviderService()\n        \n        try:\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize service: {str(e)}\")\n        \n        # Create a test agent\n        agent = ResearchAgent(\n            provider_service=provider_service,\n            knowledge_service=None,  # Mock would be better but we're testing stability\n            system_prompt=\"You are a helpful research assistant.\"\n        )\n        \n        yield {\n            \"provider_service\": provider_service,\n            \"agent\": agent\n        }\n        \n        # Cleanup\n        await provider_service.cleanup()\n    \n    async def run_conversation_turn(self, agent, message, turn_number):\n        \"\"\"Run a single conversation turn and record metrics.\"\"\"\n        start_time = time.time()\n        success = False\n        error = None\n        memory_before = self.get_memory_usage()\n        \n        try:\n            response = await agent.process_message(message, f\"test_user_{turn_number}\")\n            success = True\n        except Exception as e:\n            error = str(e)\n            response = None\n        \n        end_time = time.time()\n        memory_after = self.get_memory_usage()\n        \n        return {\n            \"turn\": turn_number,\n            \"success\": success,\n            \"error\": error,\n            \"duration\": end_time - start_time,\n            \"memory_before\": memory_before,\n            \"memory_after\": memory_after,\n            \"memory_increase\": memory_after - memory_before,\n            \"history_length\": len(agent.state.conversation_history),\n            \"response_length\": len(response) if response else 0\n        }\n    \n    def get_memory_usage(self):\n        \"\"\"Get current memory usage in MB.\"\"\"\n        import psutil\n        process = psutil.Process(os.getpid())\n        memory_info = process.memory_info()\n        return memory_info.rss / (1024 * 1024)  # Convert to MB\n    \n    @pytest.mark.asyncio\n    async def test_extended_conversation(self, setup):\n        \"\"\"Test system stability over an extended conversation.\"\"\"\n        agent = setup[\"agent\"]\n        \n        # List of test questions for the conversation\n        questions = [\n            \"What is machine learning?\",\n            \"Can you explain neural networks?\",\n            \"What is the difference between supervised and unsupervised learning?\",\n            \"How does reinforcement learning work?\",\n            \"What are some applications of deep learning?\",\n            \"Explain the concept of overfitting.\",\n            \"What is transfer learning?\",\n            \"How does backpropagation work?\",\n            \"What are convolutional neural networks?\",\n            \"Explain the transformer architecture.\",\n            \"What is BERT and how does it work?\",\n            \"What are GANs used for?\",\n            \"Explain the concept of attention in neural networks.\",\n            \"What is the difference between RNNs and LSTMs?\",\n            \"How do recommendation systems work?\"\n        ]\n        \n        # Run an extended conversation\n        results = []\n        turn_limit = min(len(questions), 15)  # Limit to 15 turns for test duration\n        \n        for turn in range(turn_limit):\n            # For later turns, occasionally refer to previous information\n            if turn \u003e 3 and random.random() \u003c 0.3:\n                message = f\"Can you explain more about what you mentioned earlier regarding {random.choice(questions[:turn]).lower().replace('?', '')}\"\n            else:\n                message = questions[turn]\n                \n            result = await self.run_conversation_turn(agent, message, turn)\n            results.append(result)\n            \n            # Print progress\n            status = \"✓\" if result[\"success\"] else \"✗\"\n            print(f\"Turn {turn+1}/{turn_limit} {status} - Time: {result['duration']:.2f}s\")\n            \n            # Delay between turns\n            await asyncio.sleep(2)\n        \n        # Analyze results\n        df = pd.DataFrame(results)\n        \n        # Print summary statistics\n        print(\"\\nExtended Conversation Test Results:\")\n        print(f\"Success rate: {df['success'].mean()*100:.1f}%\")\n        print(f\"Average response time: {df['duration'].mean():.2f}s\")\n        print(f\"Final conversation history length: {df['history_length'].iloc[-1]}\")\n        print(f\"Memory usage increase: {df['memory_after'].iloc[-1] - df['memory_before'].iloc[0]:.2f} MB\")\n        \n        # Create visualization\n        plt.figure(figsize=(15, 12))\n        \n        # Plot response times\n        plt.subplot(3, 1, 1)\n        plt.plot(df['turn'], df['duration'], marker='o')\n        plt.title('Response Time by Conversation Turn')\n        plt.xlabel('Turn')\n        plt.ylabel('Duration (seconds)')\n        plt.grid(True)\n        \n        # Plot memory usage\n        plt.subplot(3, 1, 2)\n        plt.plot(df['turn'], df['memory_after'], marker='o')\n        plt.title('Memory Usage Over Conversation')\n        plt.xlabel('Turn')\n        plt.ylabel('Memory (MB)')\n        plt.grid(True)\n        \n        # Plot history length and response length\n        plt.subplot(3, 1, 3)\n        plt.plot(df['turn'], df['history_length'], marker='o', label='History Length')\n        plt.plot(df['turn'], df['response_length'], marker='x', label='Response Length')\n        plt.title('Conversation Metrics')\n        plt.xlabel('Turn')\n        plt.ylabel('Length (chars/items)')\n        plt.legend()\n        plt.grid(True)\n        \n        plt.tight_layout()\n        plt.savefig('stability_test_results.png')\n        \n        # Assert reasonable success rate\n        assert df['success'].mean() \u003e= 0.8, \"Success rate below 80%\"\n        \n        # Check for memory leaks (large, consistent growth would be concerning)\n        memory_growth_rate = (df['memory_after'].iloc[-1] - df['memory_before'].iloc[0]) / turn_limit\n        assert memory_growth_rate \u003c 50, f\"Excessive memory growth rate: {memory_growth_rate:.2f} MB/turn\"\n"])</script><script>self.__next_f.push([1,"d9:[\"$\",\"pre\",\"pre-51\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2f0\"}]}]\nda:[\"$\",\"h2\",\"h2-43\",{\"id\":\"automation-framework\",\"children\":\"Automation Framework\"}]\ndb:[\"$\",\"h3\",\"h3-35\",{\"id\":\"test-orchestration-script\",\"children\":\"Test Orchestration Script\"}]\n2f1:Td42,"])</script><script>self.__next_f.push([1,"# scripts/run_tests.py\n#!/usr/bin/env python\nimport argparse\nimport os\nimport sys\nimport subprocess\nimport time\nfrom datetime import datetime\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description='Run test suite for OpenAI-Ollama integration')\n    parser.add_argument('--unit', action='store_true', help='Run unit tests')\n    parser.add_argument('--integration', action='store_true', help='Run integration tests')\n    parser.add_argument('--performance', action='store_true', help='Run performance tests')\n    parser.add_argument('--reliability', action='store_true', help='Run reliability tests')\n    parser.add_argument('--all', action='store_true', help='Run all tests')\n    parser.add_argument('--html', action='store_true', help='Generate HTML report')\n    parser.add_argument('--output-dir', default='test_results', help='Directory for test results')\n    \n    args = parser.parse_args()\n    \n    # If no specific test type is selected, run all\n    if not (args.unit or args.integration or args.performance or args.reliability or args.all):\n        args.all = True\n        \n    return args\n\ndef run_test_suite(test_type, output_dir, html=False):\n    \"\"\"Run a specific test suite and return success status.\"\"\"\n    print(f\"\\n{'='*80}\\nRunning {test_type} tests\\n{'='*80}\")\n    \n    timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n    report_file = f\"{output_dir}/{test_type}_report_{timestamp}\"\n    \n    # Create command with appropriate flags\n    cmd = [\"pytest\", f\"tests/{test_type}\", \"-v\"]\n    \n    if html:\n        cmd.extend([\"--html\", f\"{report_file}.html\", \"--self-contained-html\"])\n    \n    # Add JUnit XML report for CI integration\n    cmd.extend([\"--junitxml\", f\"{report_file}.xml\"])\n    \n    # Run the tests\n    start_time = time.time()\n    result = subprocess.run(cmd)\n    duration = time.time() - start_time\n    \n    # Print summary\n    status = \"PASSED\" if result.returncode == 0 else \"FAILED\"\n    print(f\"\\n{test_type} tests {status} in {duration:.2f} seconds\")\n    \n    if html:\n        print(f\"HTML report saved to {report_file}.html\")\n    \n    print(f\"XML report saved to {report_file}.xml\")\n    \n    return result.returncode == 0\n\ndef main():\n    args = parse_args()\n    \n    # Create output directory if it doesn't exist\n    os.makedirs(args.output_dir, exist_ok=True)\n    \n    # Track overall success\n    all_passed = True\n    \n    # Run selected test suites\n    if args.all or args.unit:\n        unit_passed = run_test_suite(\"unit\", args.output_dir, args.html)\n        all_passed = all_passed and unit_passed\n    \n    if args.all or args.integration:\n        integration_passed = run_test_suite(\"integration\", args.output_dir, args.html)\n        all_passed = all_passed and integration_passed\n    \n    if args.all or args.performance:\n        performance_passed = run_test_suite(\"performance\", args.output_dir, args.html)\n        # Performance tests might be informational, so don't fail the build\n    \n    if args.all or args.reliability:\n        reliability_passed = run_test_suite(\"reliability\", args.output_dir, args.html)\n        all_passed = all_passed and reliability_passed\n    \n    # Print overall summary\n    print(f\"\\n{'='*80}\")\n    print(f\"Test Suite {'PASSED' if all_passed else 'FAILED'}\")\n    print(f\"{'='*80}\")\n    \n    # Return appropriate exit code\n    return 0 if all_passed else 1\n\nif __name__ == \"__main__\":\n    sys.exit(main())\n"])</script><script>self.__next_f.push([1,"dc:[\"$\",\"pre\",\"pre-52\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2f1\"}]}]\ndd:[\"$\",\"h3\",\"h3-36\",{\"id\":\"cicd-configuration\",\"children\":\"CI/CD Configuration\"}]\n2f2:T86d,"])</script><script>self.__next_f.push([1,"# .github/workflows/test.yml\nname: Test Suite\n\non:\n  push:\n    branches: [ main, develop ]\n  pull_request:\n    branches: [ main, develop ]\n  workflow_dispatch:\n    inputs:\n      test_type:\n        description: 'Test suite to run (unit, integration, all)'\n        required: true\n        default: 'unit'\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    \n    services:\n      ollama:\n        image: ollama/ollama:latest\n        ports:\n          - 11434:11434\n    \n    steps:\n    - uses: actions/checkout@v3\n    \n    - name: Set up Python\n      uses: actions/setup-python@v4\n      with:\n        python-version: '3.11'\n    \n    - name: Install dependencies\n      run: |\n        python -m pip install --upgrade pip\n        pip install -r requirements.txt\n        pip install -r requirements-dev.txt\n    \n    - name: Pull Ollama models\n      run: |\n        # Wait for Ollama service to be ready\n        timeout 60 bash -c 'until curl -s -f http://localhost:11434/api/tags \u003e /dev/null; do sleep 1; done'\n        # Pull basic model for testing\n        curl -X POST http://localhost:11434/api/pull -d '{\"name\":\"llama2:7b-chat-q4_0\"}'\n      \n    - name: Run unit tests\n      if: ${{ github.event.inputs.test_type == 'unit' || github.event.inputs.test_type == 'all' || github.event.inputs.test_type == '' }}\n      env:\n        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}\n        OLLAMA_HOST: http://localhost:11434\n      run: pytest tests/unit -v --junitxml=unit-test-results.xml\n    \n    - name: Run integration tests\n      if: ${{ github.event.inputs.test_type == 'integration' || github.event.inputs.test_type == 'all' }}\n      env:\n        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}\n        OLLAMA_HOST: http://localhost:11434\n      run: pytest tests/integration -v --junitxml=integration-test-results.xml\n    \n    - name: Upload test results\n      if: always()\n      uses: actions/upload-artifact@v3\n      with:\n        name: test-results\n        path: '*-test-results.xml'\n        \n    - name: Publish Test Report\n      uses: mikepenz/action-junit-report@v3\n      if: always()\n      with:\n        report_paths: '*-test-results.xml'\n        fail_on_failure: true\n"])</script><script>self.__next_f.push([1,"de:[\"$\",\"pre\",\"pre-53\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$2f2\"}]}]\ndf:[\"$\",\"h2\",\"h2-44\",{\"id\":\"comparative-benchmark-framework\",\"children\":\"Comparative Benchmark Framework\"}]\ne0:[\"$\",\"h3\",\"h3-37\",{\"id\":\"response-quality-evaluation-matrix\",\"children\":\"Response Quality Evaluation Matrix\"}]\n2f3:T32d3,"])</script><script>self.__next_f.push([1,"# tests/benchmarks/quality_matrix.py\nimport pytest\nimport asyncio\nimport json\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport os\nfrom typing import List, Dict, Any\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\n# Test questions across multiple domains\nBENCHMARK_QUESTIONS = {\n    \"factual_knowledge\": [\n        \"What are the main causes of climate change?\",\n        \"Explain how vaccines work in the human body.\",\n        \"What were the key causes of World War I?\",\n        \"Describe the process of photosynthesis.\",\n        \"What is the difference between DNA and RNA?\"\n    ],\n    \"reasoning\": [\n        \"If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?\",\n        \"A bat and ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?\",\n        \"In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?\",\n        \"If three people can paint three fences in three hours, how many people would be needed to paint six fences in six hours?\",\n        \"Imagine a rope that goes around the Earth at the equator, lying flat on the ground. If you add 10 meters to the length of this rope and space it evenly above the ground, how high above the ground would the rope be?\"\n    ],\n    \"creative_writing\": [\n        \"Write a short story about a robot discovering emotions.\",\n        \"Create a poem about the changing seasons.\",\n        \"Write a creative dialogue between the ocean and the moon.\",\n        \"Describe a world where humans can photosynthesize like plants.\",\n        \"Create a character sketch of a time-traveling historian.\"\n    ],\n    \"code_generation\": [\n        \"Write a Python function to check if a string is a palindrome.\",\n        \"Create a JavaScript function that finds the most frequent element in an array.\",\n        \"Write a SQL query to find the top 5 customers by purchase amount.\",\n        \"Implement a binary search algorithm in the language of your choice.\",\n        \"Write a function to detect a cycle in a linked list.\"\n    ],\n    \"instruction_following\": [\n        \"List 5 fruits, then number them in the reverse order, then highlight the one that starts with 'a' if any.\",\n        \"Explain quantum computing in 3 paragraphs, then summarize each paragraph in one sentence, then create a single slogan based on these summaries.\",\n        \"Create a table comparing 3 car models based on price, fuel efficiency, and safety. Then add a row showing which model is best in each category.\",\n        \"Write a recipe for chocolate cake, then modify it to be vegan, then list only the ingredients that changed.\",\n        \"Translate 'Hello, how are you?' to French, Spanish, and German, then identify which language uses the most words.\"\n    ]\n}\n\nclass TestQualityMatrix:\n    @pytest.fixture\n    async def services(self):\n        \"\"\"Set up services for benchmark testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    async def generate_response(self, provider_service, provider, model, question):\n        \"\"\"Generate a response from a specific provider and model.\"\"\"\n        try:\n            if provider == \"openai\":\n                response = await provider_service._generate_openai_completion(\n                    messages=[{\"role\": \"user\", \"content\": question}],\n                    model=model,\n                    temperature=0.7\n                )\n            else:  # ollama\n                response = await provider_service._generate_ollama_completion(\n                    messages=[{\"role\": \"user\", \"content\": question}],\n                    model=model,\n                    temperature=0.7\n                )\n                \n            return {\n                \"success\": True,\n                \"content\": response[\"message\"][\"content\"],\n                \"metadata\": {\n                    \"model\": model,\n                    \"provider\": provider\n                }\n            }\n        except Exception as e:\n            return {\n                \"success\": False,\n                \"error\": str(e),\n                \"metadata\": {\n                    \"model\": model,\n                    \"provider\": provider\n                }\n            }\n    \n    async def evaluate_response(self, provider_service, question, response, category):\n        \"\"\"Evaluate a response using GPT-4 as a judge.\"\"\"\n        # Skip evaluation if response generation failed\n        if not response.get(\"success\", False):\n            return {\n                \"scores\": {\n                    \"correctness\": 0,\n                    \"completeness\": 0,\n                    \"coherence\": 0,\n                    \"conciseness\": 0,\n                    \"overall\": 0\n                },\n                \"explanation\": f\"Failed to generate response: {response.get('error', 'Unknown error')}\"\n            }\n        \n        evaluation_criteria = {\n            \"factual_knowledge\": [\"correctness\", \"completeness\", \"coherence\", \"citation\"],\n            \"reasoning\": [\"logical_flow\", \"correctness\", \"explanation_quality\", \"step_by_step\"],\n            \"creative_writing\": [\"originality\", \"coherence\", \"engagement\", \"language_use\"],\n            \"code_generation\": [\"correctness\", \"efficiency\", \"readability\", \"explanation\"],\n            \"instruction_following\": [\"accuracy\", \"completeness\", \"precision\", \"structure\"]\n        }\n        \n        # Get the appropriate criteria for this category\n        criteria = evaluation_criteria.get(category, [\"correctness\", \"completeness\", \"coherence\", \"overall\"])\n        \n        evaluation_prompt = [\n            {\"role\": \"system\", \"content\": f\"\"\"\n            You are an expert evaluator of AI responses. Evaluate the given response to the question based on the following criteria:\n            \n            {', '.join(criteria)}\n            \n            For each criterion, provide a score from 1-10 and a brief explanation.\n            Also provide an overall score from 1-10.\n            \n            Format your response as valid JSON with the following structure:\n            {{\n                \"scores\": {{\n                    \"{criteria[0]}\": X,\n                    \"{criteria[1]}\": X,\n                    \"{criteria[2]}\": X,\n                    \"{criteria[3]}\": X,\n                    \"overall\": X\n                }},\n                \"explanation\": \"Your overall assessment and suggestions for improvement\"\n            }}\n            \"\"\"},\n            {\"role\": \"user\", \"content\": f\"\"\"\n            Question: {question}\n            \n            Response to evaluate:\n            {response[\"content\"]}\n            \"\"\"}\n        ]\n        \n        # Use GPT-4 to evaluate\n        evaluation = await provider_service._generate_openai_completion(\n            messages=evaluation_prompt,\n            model=\"gpt-4\",\n            response_format={\"type\": \"json_object\"}\n        )\n        \n        try:\n            return json.loads(evaluation[\"message\"][\"content\"])\n        except:\n            # Fallback if parsing fails\n            return {\n                \"scores\": {criterion: 0 for criterion in criteria + [\"overall\"]},\n                \"explanation\": \"Failed to parse evaluation\"\n            }\n    \n    @pytest.mark.asyncio\n    async def test_quality_matrix(self, services):\n        \"\"\"Generate a comprehensive quality comparison matrix.\"\"\"\n        provider_service = services[\"provider_service\"]\n        \n        # Models to test\n        models = {\n            \"openai\": [\"gpt-3.5-turbo\", \"gpt-4-turbo\"],\n            \"ollama\": [\"llama2\", \"mistral\", \"codellama\"]\n        }\n        \n        # Select a subset of questions for each category to keep test duration reasonable\n        test_questions = {}\n        for category, questions in BENCHMARK_QUESTIONS.items():\n            # Take up to 3 questions per category\n            test_questions[category] = questions[:2]\n        \n        # Collect results\n        all_results = []\n        \n        for category, questions in test_questions.items():\n            for question in questions:\n                for provider in models:\n                    for model in models[provider]:\n                        print(f\"Testing {provider}:{model} on {category} question\")\n                        \n                        # Generate response\n                        response = await self.generate_response(\n                            provider_service,\n                            provider,\n                            model,\n                            question\n                        )\n                        \n                        # Save raw response\n                        model_safe_name = model.replace(\":\", \"_\")\n                        os.makedirs(\"benchmark_responses\", exist_ok=True)\n                        with open(f\"benchmark_responses/{provider}_{model_safe_name}_{category}.txt\", \"a\") as f:\n                            f.write(f\"\\nQuestion: {question}\\n\\n\")\n                            f.write(f\"Response: {response.get('content', 'ERROR: ' + response.get('error', 'Unknown error'))}\\n\")\n                            f.write(\"-\" * 80 + \"\\n\")\n                        \n                        # If successful, evaluate the response\n                        if response.get(\"success\", False):\n                            evaluation = await self.evaluate_response(\n                                provider_service,\n                                question,\n                                response,\n                                category\n                            )\n                            \n                            # Add to results\n                            result = {\n                                \"category\": category,\n                                \"question\": question,\n                                \"provider\": provider,\n                                \"model\": model,\n                                \"success\": response[\"success\"]\n                            }\n                            \n                            # Add scores\n                            for criterion, score in evaluation[\"scores\"].items():\n                                result[f\"score_{criterion}\"] = score\n                                \n                            all_results.append(result)\n                        else:\n                            # Add failed result\n                            all_results.append({\n                                \"category\": category,\n                                \"question\": question,\n                                \"provider\": provider,\n                                \"model\": model,\n                                \"success\": False,\n                                \"score_overall\": 0\n                            })\n                        \n                        # Add a delay to avoid rate limits\n                        await asyncio.sleep(2)\n        \n        # Analyze results\n        df = pd.DataFrame(all_results)\n        \n        # Save full results\n        df.to_csv(\"benchmark_quality_matrix.csv\", index=False)\n        \n        # Create summary by model and category\n        summary = df.groupby([\"provider\", \"model\", \"category\"])[\"score_overall\"].mean().reset_index()\n        pivot_summary = summary.pivot_table(\n            index=[\"provider\", \"model\"],\n            columns=\"category\",\n            values=\"score_overall\"\n        ).round(2)\n        \n        # Add average across categories\n        pivot_summary[\"average\"] = pivot_summary.mean(axis=1)\n        \n        # Save summary\n        pivot_summary.to_csv(\"benchmark_quality_summary.csv\")\n        \n        # Create visualization\n        plt.figure(figsize=(15, 10))\n        \n        # Heatmap of scores\n        plt.subplot(1, 1, 1)\n        sns.heatmap(pivot_summary, annot=True, cmap=\"YlGnBu\", vmin=1, vmax=10)\n        plt.title(\"Model Performance by Category (Average Score 1-10)\")\n        \n        plt.tight_layout()\n        plt.savefig('benchmark_quality_matrix.png')\n        \n        # Print summary to console\n        print(\"\\nQuality Benchmark Results:\")\n        print(pivot_summary.to_string())\n        \n        # Assert something meaningful\n        assert len(all_results) \u003e 0, \"No benchmark results collected\"\n"])</script><script>self.__next_f.push([1,"e1:[\"$\",\"pre\",\"pre-54\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2f3\"}]}]\ne2:[\"$\",\"h3\",\"h3-38\",{\"id\":\"latency-and-cost-efficiency-analysis\",\"children\":\"Latency and Cost Efficiency Analysis\"}]\n2f4:T25cb,"])</script><script>self.__next_f.push([1,"# tests/benchmarks/efficiency_analysis.py\nimport pytest\nimport asyncio\nimport time\nimport os\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom typing import List, Dict, Any\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\n# Test prompts of different lengths\nBENCHMARK_PROMPTS = {\n    \"short\": \"What is artificial intelligence?\",\n    \"medium\": \"Explain the differences between supervised, unsupervised, and reinforcement learning in machine learning.\",\n    \"long\": \"Write a comprehensive essay on the ethical implications of artificial intelligence in healthcare, considering patient privacy, diagnostic accuracy, and accessibility issues.\",\n    \"very_long\": \"\"\"\n    Analyze the historical development of artificial intelligence from its conceptual origins to the present day.\n    Include key milestones, technological breakthroughs, paradigm shifts in approaches, and influential researchers.\n    Also discuss how AI has been portrayed in popular culture and how that has influenced public perception and research funding.\n    Finally, provide a thoughtful discussion on where AI might be headed in the next 20 years and what ethical frameworks\n    should be considered as we continue to advance the technology.\n    \"\"\"\n}\n\nclass TestEfficiencyAnalysis:\n    @pytest.fixture\n    async def services(self):\n        \"\"\"Set up services for benchmark testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    async def measure_response_metrics(self, provider_service, provider, model, prompt, max_tokens=None):\n        \"\"\"Measure response time, token counts, and other metrics.\"\"\"\n        start_time = time.time()\n        success = False\n        error = None\n        token_count = {\"prompt\": 0, \"completion\": 0, \"total\": 0}\n        \n        try:\n            if provider == \"openai\":\n                response = await provider_service._generate_openai_completion(\n                    messages=[{\"role\": \"user\", \"content\": prompt}],\n                    model=model,\n                    max_tokens=max_tokens\n                )\n            else:  # ollama\n                response = await provider_service._generate_ollama_completion(\n                    messages=[{\"role\": \"user\", \"content\": prompt}],\n                    model=model,\n                    max_tokens=max_tokens\n                )\n                \n            success = True\n            \n            # Extract token counts from usage if available\n            if \"usage\" in response:\n                token_count = {\n                    \"prompt\": response[\"usage\"].get(\"prompt_tokens\", 0),\n                    \"completion\": response[\"usage\"].get(\"completion_tokens\", 0),\n                    \"total\": response[\"usage\"].get(\"total_tokens\", 0)\n                }\n            \n            response_text = response[\"message\"][\"content\"]\n            \n        except Exception as e:\n            error = str(e)\n            response_text = None\n        \n        end_time = time.time()\n        duration = end_time - start_time\n        \n        # Estimate cost (for OpenAI)\n        cost = 0.0\n        if provider == \"openai\" and success:\n            if \"gpt-4\" in model:\n                # GPT-4 pricing (approximate)\n                cost = token_count[\"prompt\"] * 0.00003 + token_count[\"completion\"] * 0.00006\n            else:\n                # GPT-3.5 pricing (approximate)\n                cost = token_count[\"prompt\"] * 0.0000015 + token_count[\"completion\"] * 0.000002\n        \n        return {\n            \"success\": success,\n            \"error\": error,\n            \"duration\": duration,\n            \"token_count\": token_count,\n            \"response_length\": len(response_text) if response_text else 0,\n            \"cost\": cost,\n            \"tokens_per_second\": token_count[\"completion\"] / duration if success and duration \u003e 0 else 0\n        }\n    \n    @pytest.mark.asyncio\n    async def test_efficiency_benchmark(self, services):\n        \"\"\"Perform comprehensive efficiency analysis.\"\"\"\n        provider_service = services[\"provider_service\"]\n        \n        # Models to test\n        models = {\n            \"openai\": [\"gpt-3.5-turbo\", \"gpt-4\"],\n            \"ollama\": [\"llama2\", \"mistral:7b\", \"llama2:13b\"]\n        }\n        \n        # Number of repetitions for each test\n        repetitions = 2\n        \n        # Results\n        results = []\n        \n        for prompt_length, prompt in BENCHMARK_PROMPTS.items():\n            for provider in models:\n                for model in models[provider]:\n                    print(f\"Testing {provider}:{model} with {prompt_length} prompt\")\n                    \n                    for rep in range(repetitions):\n                        try:\n                            metrics = await self.measure_response_metrics(\n                                provider_service,\n                                provider,\n                                model,\n                                prompt\n                            )\n                            \n                            results.append({\n                                \"provider\": provider,\n                                \"model\": model,\n                                \"prompt_length\": prompt_length,\n                                \"repetition\": rep + 1,\n                                **metrics\n                            })\n                            \n                            # Add a delay to avoid rate limits\n                            await asyncio.sleep(2)\n                        except Exception as e:\n                            print(f\"Error in benchmark: {str(e)}\")\n        \n        # Create DataFrame\n        df = pd.DataFrame(results)\n        \n        # Save raw results\n        df.to_csv(\"benchmark_efficiency_raw.csv\", index=False)\n        \n        # Create summary by model and prompt length\n        latency_summary = df.groupby([\"provider\", \"model\", \"prompt_length\"])[\"duration\"].mean().reset_index()\n        latency_pivot = latency_summary.pivot_table(\n            index=[\"provider\", \"model\"],\n            columns=\"prompt_length\",\n            values=\"duration\"\n        ).round(2)\n        \n        # Calculate efficiency metrics (tokens per second and cost per 1000 tokens)\n        efficiency_df = df[df[\"success\"]].copy()\n        efficiency_df[\"cost_per_1k_tokens\"] = efficiency_df.apply(\n            lambda row: (row[\"cost\"] * 1000 / row[\"token_count\"][\"total\"]) \n            if row[\"provider\"] == \"openai\" and row[\"token_count\"][\"total\"] \u003e 0 \n            else 0, \n            axis=1\n        )\n        \n        efficiency_summary = efficiency_df.groupby([\"provider\", \"model\"])[\n            [\"tokens_per_second\", \"cost_per_1k_tokens\"]\n        ].mean().round(3)\n        \n        # Save summaries\n        latency_pivot.to_csv(\"benchmark_latency_summary.csv\")\n        efficiency_summary.to_csv(\"benchmark_efficiency_summary.csv\")\n        \n        # Create visualizations\n        plt.figure(figsize=(15, 10))\n        \n        # Latency by prompt length and model\n        plt.subplot(2, 1, 1)\n        ax = plt.gca()\n        latency_pivot.plot(kind='bar', ax=ax)\n        plt.title(\"Response Time by Prompt Length\")\n        plt.ylabel(\"Time (seconds)\")\n        plt.xticks(rotation=45)\n        plt.legend(title=\"Prompt Length\")\n        \n        # Tokens per second by model\n        plt.subplot(2, 2, 3)\n        efficiency_summary[\"tokens_per_second\"].plot(kind='bar')\n        plt.title(\"Generation Speed (Tokens/Second)\")\n        plt.ylabel(\"Tokens per Second\")\n        plt.xticks(rotation=45)\n        \n        # Cost per 1000 tokens (OpenAI only)\n        plt.subplot(2, 2, 4)\n        openai_efficiency = efficiency_summary.loc[\"openai\"]\n        openai_efficiency[\"cost_per_1k_tokens\"].plot(kind='bar')\n        plt.title(\"Cost per 1000 Tokens (OpenAI)\")\n        plt.ylabel(\"Cost ($)\")\n        plt.xticks(rotation=45)\n        \n        plt.tight_layout()\n        plt.savefig('benchmark_efficiency.png')\n        \n        # Print summary to console\n        print(\"\\nLatency by Prompt Length (seconds):\")\n        print(latency_pivot.to_string())\n        \n        print(\"\\nEfficiency Metrics:\")\n        print(efficiency_summary.to_string())\n        \n        # Comparison analysis\n        if \"ollama\" in df[\"provider\"].values and \"openai\" in df[\"provider\"].values:\n            # Calculate average speedup/slowdown ratio\n            openai_avg = df[df[\"provider\"] == \"openai\"][\"duration\"].mean()\n            ollama_avg = df[df[\"provider\"] == \"ollama\"][\"duration\"].mean()\n            \n            speedup = openai_avg / ollama_avg if ollama_avg \u003e 0 else float('inf')\n            \n            print(f\"\\nAverage time ratio (OpenAI/Ollama): {speedup:.2f}\")\n            if speedup \u003e 1:\n                print(f\"Ollama is {speedup:.2f}x faster on average\")\n            else:\n                print(f\"OpenAI is {1/speedup:.2f}x faster on average\")\n        \n        # Assert something meaningful\n        assert len(results) \u003e 0, \"No benchmark results collected\"\n"])</script><script>self.__next_f.push([1,"e3:[\"$\",\"pre\",\"pre-55\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2f4\"}]}]\ne4:[\"$\",\"h3\",\"h3-39\",{\"id\":\"tool-usage-comparison\",\"children\":\"Tool Usage Comparison\"}]\n2f5:T3356,"])</script><script>self.__next_f.push([1,"# tests/benchmarks/tool_usage_comparison.py\nimport pytest\nimport asyncio\nimport json\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport os\nfrom typing import List, Dict, Any\n\nfrom app.services.provider_service import ProviderService, Provider\nfrom app.services.ollama_service import OllamaService\n\n# Test tools for benchmarking\nBENCHMARK_TOOLS = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"get_weather\",\n            \"description\": \"Get the current weather in a location\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"location\": {\n                        \"type\": \"string\",\n                        \"description\": \"The city and state, e.g. San Francisco, CA\"\n                    },\n                    \"unit\": {\n                        \"type\": \"string\",\n                        \"enum\": [\"celsius\", \"fahrenheit\"],\n                        \"description\": \"The temperature unit to use\"\n                    }\n                },\n                \"required\": [\"location\"]\n            }\n        }\n    },\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"search_hotels\",\n            \"description\": \"Search for hotels in a specific location\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"location\": {\n                        \"type\": \"string\",\n                        \"description\": \"The city to search in\"\n                    },\n                    \"check_in\": {\n                        \"type\": \"string\",\n                        \"description\": \"Check-in date in YYYY-MM-DD format\"\n                    },\n                    \"check_out\": {\n                        \"type\": \"string\",\n                        \"description\": \"Check-out date in YYYY-MM-DD format\"\n                    },\n                    \"guests\": {\n                        \"type\": \"integer\",\n                        \"description\": \"Number of guests\"\n                    },\n                    \"price_range\": {\n                        \"type\": \"string\",\n                        \"description\": \"Price range, e.g. '$0-$100'\"\n                    }\n                },\n                \"required\": [\"location\", \"check_in\", \"check_out\"]\n            }\n        }\n    },\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"calculate_mortgage\",\n            \"description\": \"Calculate monthly mortgage payment\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"loan_amount\": {\n                        \"type\": \"number\",\n                        \"description\": \"The loan amount in dollars\"\n                    },\n                    \"interest_rate\": {\n                        \"type\": \"number\",\n                        \"description\": \"Annual interest rate (percentage)\"\n                    },\n                    \"loan_term\": {\n                        \"type\": \"integer\",\n                        \"description\": \"Loan term in years\"\n                    },\n                    \"down_payment\": {\n                        \"type\": \"number\",\n                        \"description\": \"Down payment amount in dollars\"\n                    }\n                },\n                \"required\": [\"loan_amount\", \"interest_rate\", \"loan_term\"]\n            }\n        }\n    }\n]\n\n# Tool usage queries\nTOOL_QUERIES = [\n    \"What's the weather like in Miami right now?\",\n    \"Find me hotels in New York for next weekend for 2 people.\",\n    \"Calculate the monthly payment for a $300,000 mortgage with 4.5% interest over 30 years.\",\n    \"What's the weather in Tokyo and Paris this week?\",\n    \"I need to calculate mortgage payments for different interest rates: 3%, 4%, and 5% on a $250,000 loan.\"\n]\n\nclass TestToolUsageComparison:\n    @pytest.fixture\n    async def services(self):\n        \"\"\"Set up services for benchmark testing.\"\"\"\n        if not os.environ.get(\"OPENAI_API_KEY\"):\n            pytest.skip(\"OPENAI_API_KEY environment variable not set\")\n            \n        # Initialize services\n        ollama_service = OllamaService()\n        provider_service = ProviderService()\n        \n        try:\n            await ollama_service.initialize()\n            await provider_service.initialize()\n        except Exception as e:\n            pytest.skip(f\"Failed to initialize services: {str(e)}\")\n        \n        yield {\n            \"ollama_service\": ollama_service,\n            \"provider_service\": provider_service\n        }\n        \n        # Cleanup\n        await ollama_service.cleanup()\n        await provider_service.cleanup()\n    \n    async def generate_with_tools(self, provider_service, provider, model, query, tools):\n        \"\"\"Generate a response with tools and measure performance.\"\"\"\n        start_time = time.time()\n        success = False\n        error = None\n        \n        try:\n            if provider == \"openai\":\n                response = await provider_service._generate_openai_completion(\n                    messages=[{\"role\": \"user\", \"content\": query}],\n                    model=model,\n                    tools=tools\n                )\n            else:  # ollama\n                response = await provider_service._generate_ollama_completion(\n                    messages=[{\"role\": \"user\", \"content\": query}],\n                    model=model,\n                    tools=tools\n                )\n                \n            success = True\n            tool_calls = response.get(\"tool_calls\", [])\n            message_content = response[\"message\"][\"content\"]\n            \n            # Determine if tools were used correctly\n            tools_used = len(tool_calls) \u003e 0\n            \n            # For Ollama (which might not have native tool support), check for tool-like patterns\n            if not tools_used and provider == \"ollama\":\n                # Check if response contains structured tool usage\n                if \"\u003ctool\u003e\" in message_content:\n                    tools_used = True\n                    \n                # Look for patterns matching function names\n                for tool in tools:\n                    if f\"{tool['function']['name']}\" in message_content:\n                        tools_used = True\n                        break\n            \n        except Exception as e:\n            error = str(e)\n            message_content = None\n            tools_used = False\n            tool_calls = []\n        \n        end_time = time.time()\n        \n        return {\n            \"success\": success,\n            \"error\": error,\n            \"duration\": end_time - start_time,\n            \"message\": message_content,\n            \"tools_used\": tools_used,\n            \"tool_call_count\": len(tool_calls),\n            \"tool_calls\": tool_calls\n        }\n    \n    @pytest.mark.asyncio\n    async def test_tool_usage_benchmark(self, services):\n        \"\"\"Benchmark tool usage across providers and models.\"\"\"\n        provider_service = services[\"provider_service\"]\n        \n        # Models to test\n        models = {\n            \"openai\": [\"gpt-3.5-turbo\", \"gpt-4-turbo\"],\n            \"ollama\": [\"llama2\", \"mistral\"]\n        }\n        \n        # Results\n        results = []\n        \n        for query in TOOL_QUERIES:\n            for provider in models:\n                for model in models[provider]:\n                    print(f\"Testing {provider}:{model} with tools query: {query[:30]}...\")\n                    \n                    try:\n                        metrics = await self.generate_with_tools(\n                            provider_service,\n                            provider,\n                            model,\n                            query,\n                            BENCHMARK_TOOLS\n                        )\n                        \n                        results.append({\n                            \"provider\": provider,\n                            \"model\": model,\n                            \"query\": query,\n                            **metrics\n                        })\n                        \n                        # Save raw response\n                        model_safe_name = model.replace(\":\", \"_\")\n                        os.makedirs(\"tool_benchmark_responses\", exist_ok=True)\n                        with open(f\"tool_benchmark_responses/{provider}_{model_safe_name}.txt\", \"a\") as f:\n                            f.write(f\"\\nQuery: {query}\\n\\n\")\n                            f.write(f\"Response: {metrics.get('message', 'ERROR: ' + metrics.get('error', 'Unknown error'))}\\n\")\n                            if metrics.get('tool_calls'):\n                                f.write(\"\\nTool Calls:\\n\")\n                                f.write(json.dumps(metrics['tool_calls'], indent=2))\n                            f.write(\"\\n\" + \"-\" * 80 + \"\\n\")\n                        \n                        # Add a delay to avoid rate limits\n                        await asyncio.sleep(2)\n                    except Exception as e:\n                        print(f\"Error in benchmark: {str(e)}\")\n        \n        # Create DataFrame\n        df = pd.DataFrame(results)\n        \n        # Save raw results\n        df.to_csv(\"benchmark_tool_usage_raw.csv\", index=False)\n        \n        # Create summary\n        tool_usage_summary = df.groupby([\"provider\", \"model\"])[\n            [\"success\", \"tools_used\", \"tool_call_count\", \"duration\"]\n        ].agg({\n            \"success\": \"mean\", \n            \"tools_used\": \"mean\", \n            \"tool_call_count\": \"mean\",\n            \"duration\": \"mean\"\n        }).round(3)\n        \n        # Rename columns for clarity\n        tool_usage_summary.columns = [\n            \"Success Rate\", \n            \"Tool Usage Rate\", \n            \"Avg Tool Calls\",\n            \"Avg Duration (s)\"\n        ]\n        \n        # Save summary\n        tool_usage_summary.to_csv(\"benchmark_tool_usage_summary.csv\")\n        \n        # Create visualizations\n        plt.figure(figsize=(15, 10))\n        \n        # Tool usage rate by model\n        plt.subplot(2, 2, 1)\n        tool_usage_summary[\"Tool Usage Rate\"].plot(kind='bar')\n        plt.title(\"Tool Usage Rate by Model\")\n        plt.ylabel(\"Rate (0-1)\")\n        plt.ylim(0, 1)\n        plt.xticks(rotation=45)\n        \n        # Average tool calls by model\n        plt.subplot(2, 2, 2)\n        tool_usage_summary[\"Avg Tool Calls\"].plot(kind='bar')\n        plt.title(\"Average Tool Calls per Query\")\n        plt.ylabel(\"Count\")\n        plt.xticks(rotation=45)\n        \n        # Success rate by model\n        plt.subplot(2, 2, 3)\n        tool_usage_summary[\"Success Rate\"].plot(kind='bar')\n        plt.title(\"Success Rate\")\n        plt.ylabel(\"Rate (0-1)\")\n        plt.ylim(0, 1)\n        plt.xticks(rotation=45)\n        \n        # Average duration by model\n        plt.subplot(2, 2, 4)\n        tool_usage_summary[\"Avg Duration (s)\"].plot(kind='bar')\n        plt.title(\"Average Response Time\")\n        plt.ylabel(\"Seconds\")\n        plt.xticks(rotation=45)\n        \n        plt.tight_layout()\n        plt.savefig('benchmark_tool_usage.png')\n        \n        # Print summary to console\n        print(\"\\nTool Usage Benchmark Results:\")\n        print(tool_usage_summary.to_string())\n        \n        # Qualitative analysis - extract patterns in tool usage\n        if len(df[df[\"tools_used\"]]) \u003e 0:\n            print(\"\\nQualitative Analysis of Tool Usage:\")\n            \n            # Comparison between providers\n            openai_correct = df[(df[\"provider\"] == \"openai\") \u0026 (df[\"tools_used\"])].shape[0]\n            openai_total = df[df[\"provider\"] == \"openai\"].shape[0]\n            openai_rate = openai_correct / openai_total if openai_total \u003e 0 else 0\n            \n            ollama_correct = df[(df[\"provider\"] == \"ollama\") \u0026 (df[\"tools_used\"])].shape[0]\n            ollama_total = df[df[\"provider\"] == \"ollama\"].shape[0]\n            ollama_rate = ollama_correct / ollama_total if ollama_total \u003e 0 else 0\n            \n            print(f\"OpenAI tool usage rate: {openai_rate:.2f}\")\n            print(f\"Ollama tool usage rate: {ollama_rate:.2f}\")\n            \n            if openai_rate \u003e 0 and ollama_rate \u003e 0:\n                ratio = openai_rate / ollama_rate\n                print(f\"OpenAI is {ratio:.2f}x more likely to use tools correctly\")\n            \n            # Additional insights\n            if \"openai\" in df[\"provider\"].values and \"ollama\" in df[\"provider\"].values:\n                openai_time = df[df[\"provider\"] == \"openai\"][\"duration\"].mean()\n                ollama_time = df[df[\"provider\"] == \"ollama\"][\"duration\"].mean()\n                \n                if openai_time \u003e 0 and ollama_time \u003e 0:\n                    time_ratio = openai_time / ollama_time\n                    print(f\"Time ratio (OpenAI/Ollama): {time_ratio:.2f}\")\n                    if time_ratio \u003e 1:\n                        print(f\"Ollama is {time_ratio:.2f}x faster for tool-related queries\")\n                    else:\n                        print(f\"OpenAI is {1/time_ratio:.2f}x faster for tool-related queries\")\n        \n        # Assert something meaningful\n        assert len(results) \u003e 0, \"No benchmark results collected\"\n"])</script><script>self.__next_f.push([1,"e5:[\"$\",\"pre\",\"pre-56\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$2f5\"}]}]\ne6:[\"$\",\"h2\",\"h2-45\",{\"id\":\"pytest-configuration\",\"children\":\"Pytest Configuration\"}]\ne7:[\"$\",\"pre\",\"pre-57\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"# pytest.ini\\n[pytest]\\nmarkers =\\n    unit: marks tests as unit tests\\n    integration: marks tests as integration tests\\n    performance: marks tests as performance tests\\n    reliability: marks tests as reliability tests\\n    benchmark: marks tests as benchmarks\\n\\ntestpaths = tests\\n\\npython_files = test_*.py\\npython_classes = Test*\\npython_functions = test_*\\n\\n# Don't run performance tests by default\\naddopts = -m \\\"not performance and not reliability and not benchmark\\\"\\n\\n# Configure test outputs\\njunit_family = xunit2\\n\\n# Add environment variables for default runs\\nenv =\\n    PYTHONPATH=.\\n    OPENAI_MODEL=gpt-3.5-turbo\\n    OLLAMA_MODEL=llama2\\n    OLLAMA_HOST=http://localhost:11434\\n\"}]}]\ne8:[\"$\",\"h2\",\"h2-46\",{\"id\":\"test-documentation\",\"children\":\"Test Documentation\"}]\ne9:[\"$\",\"pre\",\"pre-58\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-markdown\",\"children\":\"# Testing Strategy for OpenAI-Ollama Integration\\n\\nThis document outlines the comprehensive testing approach for the hybrid AI system that integrates OpenAI and Ollama.\\n\\n## 1. Unit Testing\\n\\nUnit tests verify the functionality of individual components in isolation:\\n\\n- **Provider Service**: Tests for provider selection logic, auto-routing, and fallback mechanisms\\n- **Ollama Service**: Tests for response formatting, tool extraction, and error handling\\n- **Model Selection**: Tests for use case detection and model recommendation logic\\n- **Tool Integration**: Tests for proper handling of tool calls and responses\\n\\nRun unit tests with:\\n```bash\\npython -m pytest tests/unit -v\\n\"}]}]\nea:[\"$\",\"h2\",\"h2-47\",{\"id\":\"2-integration-testing\",\"children\":\"2. Integration Testing\"}]\neb:[\"$\",\"p\",\"p-37\",{\"children\":\"Integration tests verify the interaction between components:\"}]\nec:[\"$\",\"ul\",\"ul-5\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"API Endpoints\"}],\": Tests for proper request handling, authentication, and response formatting\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"End-to-End Agent Flows\"}],\": Tests for agent behavior across different scenarios\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Cross-Provider Integration\"}],\": Tests for seamless integration between OpenAI and Ollama\"]}],\"\\n\"]}]\ned:[\"$\",\"p\",\"p-38\",{\"children\":\"Run integration tests with:\"}]\nee:[\"$\",\"pre\",\"pre-59\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -m pytest tests/integration -v\\n\"}]}]\nef:[\"$\",\"h2\",\"h2-48\",{\"id\":\"3-performance-testing\",\"children\":\"3. Performance Testing\"}]\nf0:[\"$\",\"p\",\"p-39\",{\"children\":\"Performance tests measure system performance characteristics:\"}]\nf1:[\"$\",\"ul\",\"ul-6\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response Latency\"}],\": Compares response times across providers and models\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Memory Usage\"}],\": Measures memory consumption during request processing\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response Quality\"}],\": Evaluates the quality of responses using GPT-4 as a judge\"]}],\"\\n\"]}]\nf2:[\"$\",\"p\",\"p-40\",{\"children\":\"Run performance tests with:\"}]\nf3:[\"$\",\"pre\",\"pre-60\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -m pytest tests/performance -v\\"])</script><script>self.__next_f.push([1,"n\"}]}]\nf4:[\"$\",\"h2\",\"h2-49\",{\"id\":\"4-reliability-testing\",\"children\":\"4. Reliability Testing\"}]\nf5:[\"$\",\"p\",\"p-41\",{\"children\":\"Reliability tests verify the system's behavior under various conditions:\"}]\nf6:[\"$\",\"ul\",\"ul-7\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Error Handling\"}],\": Tests for proper error detection and fallback mechanisms\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Load Testing\"}],\": Measures system performance under concurrent requests\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Stability Testing\"}],\": Evaluates system behavior during extended conversations\"]}],\"\\n\"]}]\nf7:[\"$\",\"p\",\"p-42\",{\"children\":\"Run reliability tests with:\"}]\nf8:[\"$\",\"pre\",\"pre-61\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -m pytest tests/reliability -v\\n\"}]}]\nf9:[\"$\",\"h2\",\"h2-50\",{\"id\":\"5-benchmark-framework\",\"children\":\"5. Benchmark Framework\"}]\nfa:[\"$\",\"p\",\"p-43\",{\"children\":\"Comprehensive benchmarks for comparative analysis:\"}]\nfb:[\"$\",\"ul\",\"ul-8\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Quality Matrix\"}],\": Compares response quality across providers and models\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Efficiency Analysis\"}],\": Measures performance/cost characteristics\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Tool Usage Comparison\"}],\": Evaluates tool handling capabilities\"]}],\"\\n\"]}]\nfc:[\"$\",\"p\",\"p-44\",{\"children\":\"Run benchmarks with:\"}]\nfd:[\"$\",\"pre\",\"pre-62\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -m pytest tests/benchmarks -v\\n\"}]}]\nfe:[\"$\",\"h2\",\"h2-51\",{\"id\":\"running-the-complete-test-suite\",\"children\":\"Running the Complete Test Suite\"}]\nff:[\"$\",\"p\",\"p-45\",{\"children\":\"Use the test orchestration script to run all test suites:\"}]\n100:[\"$\",\"pre\",\"pre-63\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python scripts/run_tests.py --all\\n\"}]}]\n101:[\"$\",\"h2\",\"h2-52\",{\"id\":\"cicd-integration\",\"children\":\"CI/CD Integration\"}]\n102:[\"$\",\"p\",\"p-46\",{\"children\":\"The test suite is integrated with GitHub Actions workflow:\"}]\n103:[\"$\",\"pre\",\"pre-64\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Triggered on push to main/develop or manually via workflow_dispatch\\ngit push origin main  # Automatically runs tests\\n\"}]}]\n104:[\"$\",\"h2\",\"h2-53\",{\"id\":\"prerequisites\",\"children\":\"Prerequisites\"}]\n105:[\"$\",\"ol\",\"ol-13\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"OpenAI API Key in environment variables:\"}],\"\\n\"]}]\n106:[\"$\",\"pre\",\"pre-65\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"export OPENAI_API_KEY=sk-...\\n\"}],\"position\":{\"start\":{\"line\":7375,\"column\":1,\"offset\":277654},\"end\":{\"line\":7377,\"column\":4,\"offset\":277690}}},\"children\":\"export OPENAI_API_KEY=sk-...\\n\"}]}]\n107:[\"$\",\"ol\",\"ol-14\",{\"start\":2,\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Running Ollama instance:\"}],\"\\n\"]}]\n108:[\"$\",\"pre\",\"pre-66\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama serve\\n\"}]}]\n109:[\"$\",\"ol\",\"ol-15\",{\"start\":3,\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Required models for Ollama:\"}],\"\\n\"]}]\n10a:[\"$\",\"pre\",\"pre-67\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama pull llama2"])</script><script>self.__next_f.push([1,"\\nollama pull mistral\\n\"}]}]\n2f6:T69f,"])</script><script>self.__next_f.push([1,"\n## Conclusion\n\nThis comprehensive testing strategy provides a robust framework for validating the hybrid AI architecture that integrates OpenAI's cloud capabilities with Ollama's local model inference. By implementing this multi-faceted testing approach, we ensure:\n\n1. **Functional Correctness**: Unit and integration tests verify that all components function as expected both individually and when integrated.\n\n2. **Performance Optimization**: Benchmarks and performance tests provide quantitative data to guide resource allocation and routing decisions.\n\n3. **Reliability**: Load and stability tests ensure the system remains responsive and produces consistent results under various conditions.\n\n4. **Quality Assurance**: Response quality evaluations ensure that the system maintains high standards regardless of which provider handles the inference.\n\nThe test suite is designed to be extensible, allowing for additional test cases as the system evolves. By automating this testing strategy through CI/CD pipelines, we maintain ongoing quality assurance and enable continuous improvement of the hybrid AI architecture.\n\n# User Interface Design for Hybrid OpenAI-Ollama MCP System\n\n## Conceptual Framework for Interface Design\n\nThe Modern Computational Paradigm (MCP) system—integrating cloud-based intelligence with local inference capabilities—requires a thoughtfully designed interface that balances simplicity with advanced functionality. This document presents a comprehensive design approach for both command-line and web interfaces that expose the system's capabilities while maintaining an intuitive user experience.\n\n## Command Line Interface (CLI) Design\n\n### CLI Architecture\n\n"])</script><script>self.__next_f.push([1,"2f7:T69f,"])</script><script>self.__next_f.push([1,"\n## Conclusion\n\nThis comprehensive testing strategy provides a robust framework for validating the hybrid AI architecture that integrates OpenAI's cloud capabilities with Ollama's local model inference. By implementing this multi-faceted testing approach, we ensure:\n\n1. **Functional Correctness**: Unit and integration tests verify that all components function as expected both individually and when integrated.\n\n2. **Performance Optimization**: Benchmarks and performance tests provide quantitative data to guide resource allocation and routing decisions.\n\n3. **Reliability**: Load and stability tests ensure the system remains responsive and produces consistent results under various conditions.\n\n4. **Quality Assurance**: Response quality evaluations ensure that the system maintains high standards regardless of which provider handles the inference.\n\nThe test suite is designed to be extensible, allowing for additional test cases as the system evolves. By automating this testing strategy through CI/CD pipelines, we maintain ongoing quality assurance and enable continuous improvement of the hybrid AI architecture.\n\n# User Interface Design for Hybrid OpenAI-Ollama MCP System\n\n## Conceptual Framework for Interface Design\n\nThe Modern Computational Paradigm (MCP) system—integrating cloud-based intelligence with local inference capabilities—requires a thoughtfully designed interface that balances simplicity with advanced functionality. This document presents a comprehensive design approach for both command-line and web interfaces that expose the system's capabilities while maintaining an intuitive user experience.\n\n## Command Line Interface (CLI) Design\n\n### CLI Architecture\n\n"])</script><script>self.__next_f.push([1,"10b:[\"$\",\"pre\",\"pre-68\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$2f6\"}],\"position\":{\"start\":{\"line\":7389,\"column\":1,\"offset\":277828},\"end\":{\"line\":7415,\"column\":4,\"offset\":279526}}},\"children\":\"$2f7\"}]}]\n2f8:T95c,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────┐\n│                                                             │\n│  MCP-CLI                                                    │\n│                                                             │\n│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐    │\n│  │ Core Module │  │ Config      │  │ Interactive Mode │    │\n│  └─────────────┘  └─────────────┘  └──────────────────┘    │\n│         │               │                   │               │\n│         ▼               ▼                   ▼               │\n│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────┐    │\n│  │ Agent API   │  │ Model       │  │ Session          │    │\n│  │ Client      │  │ Management  │  │ Management       │    │\n│  └─────────────┘  └─────────────┘  └──────────────────┘    │\n│         │               │                   │               │\n│         └───────────────┼───────────────────┘               │\n│                         │                                   │\n│                         ▼                                   │\n│                  ┌─────────────┐                           │\n│                  │ Output      │                           │\n│                  │ Formatting  │                           │\n│                  └─────────────┘                           │\n│                                                             │\n└─────────────────────────────────────────────────────────────┘"])</script><script>self.__next_f.push([1,"10c:[\"$\",\"p\",\"p-47\",{\"children\":\"$2f8\"}]\n10d:[\"$\",\"pre\",\"pre-69\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"\\n### CLI Wireframes\\n\\n#### Main Help Screen\\n\\n\"}],\"position\":{\"start\":{\"line\":7439,\"column\":1,\"offset\":280988},\"end\":{\"line\":7445,\"column\":4,\"offset\":281039}}},\"children\":\"\\n### CLI Wireframes\\n\\n#### Main Help Screen\\n\\n\"}]}]\n2f9:T7b3,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│                                                                         │\n│  MCP CLI v1.0.0                                                         │\n│                                                                         │\n│  USAGE:                                                                 │\n│    mcp [OPTIONS] COMMAND [ARGS]...                                      │\n│                                                                         │\n│  OPTIONS:                                                               │\n│    --config PATH       Path to config file                              │\n│    --verbose           Enable verbose output                            │\n│    --help              Show this message and exit                       │\n│                                                                         │\n│  COMMANDS:                                                              │\n│    chat                Start a chat session                             │\n│    complete            Get a completion for a prompt                    │\n│    models              List and manage available models                 │\n│    config              Configure MCP settings                           │\n│    agents              Manage agent profiles                            │\n│    session             Manage saved sessions                            │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘"])</script><script>self.__next_f.push([1,"10e:[\"$\",\"p\",\"p-48\",{\"children\":\"$2f9\"}]\n10f:[\"$\",\"pre\",\"pre-70\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"\\n#### Interactive Chat Mode\\n\\n\"}],\"position\":{\"start\":{\"line\":7467,\"column\":1,\"offset\":282636},\"end\":{\"line\":7471,\"column\":4,\"offset\":282672}}},\"children\":\"\\n#### Interactive Chat Mode\\n\\n\"}]}]\n2fa:Ta1d,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│                                                                         │\n│  MCP Chat Session - ID: chat_78f3d2                                     │\n│  Model: auto-select | Provider: auto | Agent: research                  │\n│                                                                         │\n│  Type 'exit' to quit, 'help' for commands, 'models' to switch models    │\n│  ────────────────────────────────────────────────────────────────────   │\n│                                                                         │\n│  You: Tell me about quantum computing                                   │\n│                                                                         │\n│  MCP [OpenAI:gpt-4]: Quantum computing is a type of computation that    │\n│  harnesses quantum mechanical phenomena like superposition and          │\n│  entanglement to process information in ways that classical computers   │\n│  cannot.                                                                │\n│                                                                         │\n│  Unlike classical bits that exist in a state of either 0 or 1, quantum  │\n│  bits or \"qubits\" can exist in multiple states simultaneously due to    │\n│  superposition. This potentially allows quantum computers to explore    │\n│  multiple solutions to a problem at once.                               │\n│                                                                         │\n│  [Response continues for several more paragraphs...]                    │\n│                                                                         │\n│  You: Can you explain quantum entanglement more simply?                 │\n│                                                                         │\n│  MCP [Ollama:mistral]: █                                                │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘"])</script><script>self.__next_f.push([1,"110:[\"$\",\"p\",\"p-49\",{\"children\":\"$2fa\"}]\n111:[\"$\",\"pre\",\"pre-71\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"\\n#### Model Management Screen\\n\\n\"}],\"position\":{\"start\":{\"line\":7499,\"column\":1,\"offset\":284725},\"end\":{\"line\":7503,\"column\":4,\"offset\":284763}}},\"children\":\"\\n#### Model Management Screen\\n\\n\"}]}]\n2fb:T8ad,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│                                                                         │\n│  MCP Models                                                             │\n│                                                                         │\n│  AVAILABLE MODELS:                                                      │\n│                                                                         │\n│  OpenAI:                                                                │\n│    [✓] gpt-4-turbo          - Advanced reasoning, current knowledge     │\n│    [✓] gpt-3.5-turbo        - Fast, efficient for standard tasks        │\n│                                                                         │\n│  Ollama:                                                                │\n│    [✓] llama2               - General purpose local model               │\n│    [✓] mistral              - Strong reasoning, 8k context window       │\n│    [✓] codellama            - Specialized for code generation           │\n│    [ ] wizard-math          - Mathematical problem-solving              │\n│                                                                         │\n│  COMMANDS:                                                              │\n│                                                                         │\n│    pull MODEL_NAME          - Download a model to Ollama                │\n│    info MODEL_NAME          - Show detailed model information           │\n│    benchmark MODEL_NAME     - Run performance benchmark                 │\n│    set-default MODEL_NAME   - Set as default model                      │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘"])</script><script>self.__next_f.push([1,"112:[\"$\",\"p\",\"p-50\",{\"children\":\"$2fb\"}]\n113:[\"$\",\"pre\",\"pre-72\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"\\n#### Agent Configuration Screen\\n\\n\"}],\"position\":{\"start\":{\"line\":7528,\"column\":1,\"offset\":286588},\"end\":{\"line\":7532,\"column\":4,\"offset\":286629}}},\"children\":\"\\n#### Agent Configuration Screen\\n\\n\"}]}]\n2fc:T8ff,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│                                                                         │\n│  MCP Agent Configuration                                                │\n│                                                                         │\n│  AVAILABLE AGENTS:                                                      │\n│                                                                         │\n│    [✓] general             - General purpose assistant                  │\n│    [✓] research            - Research specialist with knowledge tools   │\n│    [✓] coding              - Code assistant with tool integration       │\n│    [✓] creative            - Creative writing and content generation    │\n│                                                                         │\n│  CUSTOM AGENTS:                                                         │\n│                                                                         │\n│    [✓] my-math-tutor       - Mathematics teaching and problem solving   │\n│    [✓] data-analyst        - Data analysis with visualization tools     │\n│                                                                         │\n│  COMMANDS:                                                              │\n│                                                                         │\n│    create NAME             - Create a new custom agent                  │\n│    edit NAME               - Edit an existing agent                     │\n│    delete NAME             - Delete a custom agent                      │\n│    export NAME FILE        - Export agent configuration                 │\n│    import FILE             - Import agent configuration                 │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘"])</script><script>self.__next_f.push([1,"114:[\"$\",\"p\",\"p-51\",{\"children\":\"$2fc\"}]\n115:[\"$\",\"pre\",\"pre-73\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"\\n### CLI Interaction Flow\\n\\n\"}],\"position\":{\"start\":{\"line\":7558,\"column\":1,\"offset\":288530},\"end\":{\"line\":7562,\"column\":4,\"offset\":288564}}},\"children\":\"\\n### CLI Interaction Flow\\n\\n\"}]}]\n116:[\"$\",\"p\",\"p-52\",{\"children\":\"┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐\\n│             │     │             │     │             │     │             │\\n│  Start CLI  │────▶│ Select Mode │────▶│ Set Config  │────▶│   Session   │\\n│             │     │             │     │             │     │ Interaction │\\n└─────────────┘     └─────────────┘     └─────────────┘     └──────┬──────┘\\n│\\n┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌──────▼──────┐\\n│             │     │             │     │             │     │             │\\n│   Export    │◀────│   Session   │◀────│  Generate   │◀────│    User     │\\n│   Results   │     │ Management  │     │  Response   │     │   Prompt    │\\n│             │     │             │     │             │     │             │\\n└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘\"}]\n2fd:T3b9f,"])</script><script>self.__next_f.push([1,"\n### CLI Implementation Example\n\n```python\n# mcp_cli.py\nimport argparse\nimport os\nimport json\nimport sys\nimport time\nfrom typing import Dict, Any, List, Optional\nimport requests\nimport yaml\nimport colorama\nfrom colorama import Fore, Style\nfrom prompt_toolkit import PromptSession\nfrom prompt_toolkit.history import FileHistory\nfrom prompt_toolkit.auto_suggest import AutoSuggestFromHistory\nfrom prompt_toolkit.completion import WordCompleter\nfrom rich.console import Console\nfrom rich.markdown import Markdown\nfrom rich.panel import Panel\nfrom rich.progress import Progress\n\n# Initialize colorama for cross-platform color support\ncolorama.init()\nconsole = Console()\n\nCONFIG_PATH = os.path.expanduser(\"~/.mcp/config.yaml\")\nHISTORY_PATH = os.path.expanduser(\"~/.mcp/history\")\nAPI_URL = \"http://localhost:8000/api/v1\"\n\ndef ensure_config_dir():\n    \"\"\"Ensure the config directory exists.\"\"\"\n    config_dir = os.path.dirname(CONFIG_PATH)\n    os.makedirs(config_dir, exist_ok=True)\n    os.makedirs(os.path.dirname(HISTORY_PATH), exist_ok=True)\n\ndef load_config():\n    \"\"\"Load configuration from file.\"\"\"\n    ensure_config_dir()\n    \n    if not os.path.exists(CONFIG_PATH):\n        # Create default config\n        config = {\n            \"api\": {\n                \"url\": API_URL,\n                \"key\": None\n            },\n            \"defaults\": {\n                \"model\": \"auto\",\n                \"provider\": \"auto\",\n                \"agent\": \"general\"\n            },\n            \"output\": {\n                \"format\": \"markdown\",\n                \"show_model_info\": True\n            }\n        }\n        \n        with open(CONFIG_PATH, 'w') as f:\n            yaml.dump(config, f, default_flow_style=False)\n        \n        console.print(f\"Created default config at {CONFIG_PATH}\", style=\"yellow\")\n        return config\n    \n    with open(CONFIG_PATH, 'r') as f:\n        return yaml.safe_load(f)\n\ndef save_config(config):\n    \"\"\"Save configuration to file.\"\"\"\n    with open(CONFIG_PATH, 'w') as f:\n        yaml.dump(config, f, default_flow_style=False)\n\ndef get_api_key(config):\n    \"\"\"Get API key from config or environment.\"\"\"\n    if config[\"api\"][\"key\"]:\n        return config[\"api\"][\"key\"]\n    \n    env_key = os.environ.get(\"MCP_API_KEY\")\n    if env_key:\n        return env_key\n    \n    # If no key is configured, prompt the user\n    console.print(\"No API key found. Please enter your API key:\", style=\"yellow\")\n    key = input(\"\u003e \")\n    \n    if key:\n        config[\"api\"][\"key\"] = key\n        save_config(config)\n        return key\n    \n    console.print(\"No API key provided. Some features may not work.\", style=\"red\")\n    return None\n\ndef make_api_request(endpoint, method=\"GET\", data=None, config=None):\n    \"\"\"Make an API request to the MCP backend.\"\"\"\n    if config is None:\n        config = load_config()\n    \n    api_key = get_api_key(config)\n    headers = {\n        \"Content-Type\": \"application/json\"\n    }\n    \n    if api_key:\n        headers[\"Authorization\"] = f\"Bearer {api_key}\"\n    \n    url = f\"{config['api']['url']}/{endpoint.lstrip('/')}\"\n    \n    try:\n        if method == \"GET\":\n            response = requests.get(url, headers=headers)\n        elif method == \"POST\":\n            response = requests.post(url, headers=headers, json=data)\n        else:\n            raise ValueError(f\"Unsupported HTTP method: {method}\")\n        \n        response.raise_for_status()\n        return response.json()\n    except requests.exceptions.RequestException as e:\n        console.print(f\"API request failed: {str(e)}\", style=\"red\")\n        return None\n\ndef display_response(response_text, format_type=\"markdown\"):\n    \"\"\"Display a response with appropriate formatting.\"\"\"\n    if format_type == \"markdown\":\n        console.print(Markdown(response_text))\n    else:\n        console.print(response_text)\n\ndef chat_command(args, config):\n    \"\"\"Start an interactive chat session.\"\"\"\n    session_id = args.session_id\n    model_name = args.model or config[\"defaults\"][\"model\"]\n    provider = args.provider or config[\"defaults\"][\"provider\"]\n    agent_type = args.agent or config[\"defaults\"][\"agent\"]\n    \n    console.print(Panel(f\"Starting MCP Chat Session\\nModel: {model_name} | Provider: {provider} | Agent: {agent_type}\"))\n    console.print(\"Type 'exit' to quit, 'help' for commands\", style=\"dim\")\n    \n    # Set up prompt session with history\n    ensure_config_dir()\n    history_file = os.path.join(HISTORY_PATH, \"chat_history\")\n    session = PromptSession(\n        history=FileHistory(history_file),\n        auto_suggest=AutoSuggestFromHistory(),\n        completer=WordCompleter(['exit', 'help', 'models', 'clear', 'save', 'switch'])\n    )\n    \n    # Initial session data\n    if not session_id:\n        # Create a new session\n        pass\n    \n    while True:\n        try:\n            user_input = session.prompt(f\"{Fore.GREEN}You: {Style.RESET_ALL}\")\n            \n            if user_input.lower() in ('exit', 'quit'):\n                break\n            \n            if not user_input.strip():\n                continue\n            \n            # Handle special commands\n            if user_input.lower() == 'help':\n                console.print(Panel(\"\"\"\n                Available commands:\n                - exit/quit: Exit the chat session\n                - clear: Clear the current conversation\n                - save FILENAME: Save conversation to file\n                - models: List available models\n                - switch MODEL: Switch to a different model\n                - provider PROVIDER: Switch to a different provider\n                \"\"\"))\n                continue\n            \n            # For normal input, send to API\n            with Progress() as progress:\n                task = progress.add_task(\"[cyan]Generating response...\", total=None)\n                \n                data = {\n                    \"message\": user_input,\n                    \"session_id\": session_id,\n                    \"model_params\": {\n                        \"provider\": provider,\n                        \"model\": model_name,\n                        \"auto_select\": provider == \"auto\"\n                    }\n                }\n                \n                response = make_api_request(\"chat\", method=\"POST\", data=data, config=config)\n                progress.update(task, completed=100)\n            \n            if response:\n                session_id = response[\"session_id\"]\n                model_used = response.get(\"model_used\", model_name)\n                provider_used = response.get(\"provider_used\", provider)\n                \n                # Display provider and model info if configured\n                if config[\"output\"][\"show_model_info\"]:\n                    console.print(f\"\\n{Fore.BLUE}MCP [{provider_used}:{model_used}]:{Style.RESET_ALL}\")\n                else:\n                    console.print(f\"\\n{Fore.BLUE}MCP:{Style.RESET_ALL}\")\n                \n                display_response(response[\"response\"], config[\"output\"][\"format\"])\n                console.print()  # Empty line for readability\n        \n        except KeyboardInterrupt:\n            break\n        except EOFError:\n            break\n        except Exception as e:\n            console.print(f\"Error: {str(e)}\", style=\"red\")\n    \n    console.print(\"Chat session ended\")\n\ndef models_command(args, config):\n    \"\"\"List and manage available models.\"\"\"\n    if args.pull:\n        # Pull a new model for Ollama\n        console.print(f\"Pulling Ollama model: {args.pull}\")\n        \n        with Progress() as progress:\n            task = progress.add_task(f\"[cyan]Pulling {args.pull}...\", total=None)\n            \n            # This would actually call Ollama API\n            time.sleep(2)  # Simulating download\n            \n            progress.update(task, completed=100)\n        \n        console.print(f\"Successfully pulled {args.pull}\", style=\"green\")\n        return\n    \n    # List available models\n    console.print(Panel(\"Available Models\"))\n    \n    console.print(\"\\n[bold]OpenAI Models:[/bold]\")\n    openai_models = [\n        {\"name\": \"gpt-4-turbo\", \"description\": \"Advanced reasoning, current knowledge\"},\n        {\"name\": \"gpt-3.5-turbo\", \"description\": \"Fast, efficient for standard tasks\"}\n    ]\n    \n    for model in openai_models:\n        console.print(f\"  • {model['name']} - {model['description']}\")\n    \n    console.print(\"\\n[bold]Ollama Models:[/bold]\")\n    \n    # In a real implementation, this would fetch from Ollama API\n    ollama_models = [\n        {\"name\": \"llama2\", \"description\": \"General purpose local model\", \"installed\": True},\n        {\"name\": \"mistral\", \"description\": \"Strong reasoning, 8k context window\", \"installed\": True},\n        {\"name\": \"codellama\", \"description\": \"Specialized for code generation\", \"installed\": True},\n        {\"name\": \"wizard-math\", \"description\": \"Mathematical problem-solving\", \"installed\": False}\n    ]\n    \n    for model in ollama_models:\n        status = \"[green]✓[/green]\" if model[\"installed\"] else \"[red]✗[/red]\"\n        console.print(f\"  {status} {model['name']} - {model['description']}\")\n    \n    console.print(\"\\nUse 'mcp models --pull MODEL_NAME' to download a model\")\n\ndef config_command(args, config):\n    \"\"\"View or edit configuration.\"\"\"\n    if args.set:\n        # Set a configuration value\n        key, value = args.set.split('=', 1)\n        keys = key.split('.')\n        \n        # Navigate to the nested key\n        current = config\n        for k in keys[:-1]:\n            if k not in current:\n                current[k] = {}\n            current = current[k]\n        \n        # Set the value (with type conversion)\n        if value.lower() == 'true':\n            current[keys[-1]] = True\n        elif value.lower() == 'false':\n            current[keys[-1]] = False\n        elif value.isdigit():\n            current[keys[-1]] = int(value)\n        else:\n            current[keys[-1]] = value\n        \n        save_config(config)\n        console.print(f\"Configuration updated: {key} = {value}\", style=\"green\")\n        return\n    \n    # Display current configuration\n    console.print(Panel(\"MCP Configuration\"))\n    console.print(yaml.dump(config))\n    console.print(\"\\nUse 'mcp config --set key.path=value' to change settings\")\n\ndef agent_command(args, config):\n    \"\"\"Manage agent profiles.\"\"\"\n    if args.create:\n        # Create a new agent profile\n        console.print(f\"Creating agent profile: {args.create}\")\n        # Implementation would collect agent parameters\n        return\n    \n    if args.edit:\n        # Edit an existing agent profile\n        console.print(f\"Editing agent profile: {args.edit}\")\n        return\n    \n    # List available agents\n    console.print(Panel(\"Available Agents\"))\n    \n    console.print(\"\\n[bold]System Agents:[/bold]\")\n    system_agents = [\n        {\"name\": \"general\", \"description\": \"General purpose assistant\"},\n        {\"name\": \"research\", \"description\": \"Research specialist with knowledge tools\"},\n        {\"name\": \"coding\", \"description\": \"Code assistant with tool integration\"},\n        {\"name\": \"creative\", \"description\": \"Creative writing and content generation\"}\n    ]\n    \n    for agent in system_agents:\n        console.print(f\"  • {agent['name']} - {agent['description']}\")\n    \n    # In a real implementation, this would load from user config\n    custom_agents = [\n        {\"name\": \"my-math-tutor\", \"description\": \"Mathematics teaching and problem solving\"},\n        {\"name\": \"data-analyst\", \"description\": \"Data analysis with visualization tools\"}\n    ]\n    \n    if custom_agents:\n        console.print(\"\\n[bold]Custom Agents:[/bold]\")\n        for agent in custom_agents:\n            console.print(f\"  • {agent['name']} - {agent['description']}\")\n    \n    console.print(\"\\nUse 'mcp agents --create NAME' to create a new agent\")\n\ndef main():\n    \"\"\"Main entry point for the CLI.\"\"\"\n    parser = argparse.ArgumentParser(description=\"MCP Command Line Interface\")\n    parser.add_argument('--config', help=\"Path to config file\")\n    parser.add_argument('--verbose', action='store_true', help=\"Enable verbose output\")\n    \n    subparsers = parser.add_subparsers(dest='command', help='Command to run')\n    \n    # Chat command\n    chat_parser = subparsers.add_parser('chat', help='Start a chat session')\n    chat_parser.add_argument('--model', help='Model to use')\n    chat_parser.add_argument('--provider', choices=['openai', 'ollama', 'auto'], help='Provider to use')\n    chat_parser.add_argument('--agent', help='Agent type to use')\n    chat_parser.add_argument('--session-id', help='Resume an existing session')\n    \n    # Complete command (one-shot completion)\n    complete_parser = subparsers.add_parser('complete', help='Get a completion for a prompt')\n    complete_parser.add_argument('prompt', help='Prompt text')\n    complete_parser.add_argument('--model', help='Model to use')\n    complete_parser.add_argument('--provider', choices=['openai', 'ollama', 'auto'], help='Provider to use')\n    \n    # Models command\n    models_parser = subparsers.add_parser('models', help='List and manage available models')\n    models_parser.add_argument('--pull', metavar='MODEL_NAME', help='Download a model to Ollama')\n    models_parser.add_argument('--info', metavar='MODEL_NAME', help='Show detailed model information')\n    models_parser.add_argument('--benchmark', metavar='MODEL_NAME', help='Run performance benchmark')\n    \n    # Config command\n    config_parser = subparsers.add_parser('config', help='Configure MCP settings')\n    config_parser.add_argument('--set', metavar='KEY=VALUE', help='Set a configuration value')\n    \n    # Agents command\n    agents_parser = subparsers.add_parser('agents', help='Manage agent profiles')\n    agents_parser.add_argument('--create', metavar='NAME', help='Create a new custom agent')\n    agents_parser.add_argument('--edit', metavar='NAME', help='Edit an existing agent')\n    agents_parser.add_argument('--delete', metavar='NAME', help='Delete a custom agent')\n    \n    # Session command\n    session_parser = subparsers.add_parser('session', help='Manage saved sessions')\n    session_parser.add_argument('--list', action='store_true', help='List saved sessions')\n    session_parser.add_argument('--delete', metavar='SESSION_ID', help='Delete a session')\n    session_parser.add_argument('--export', metavar='SESSION_ID', help='Export a session')\n    \n    args = parser.parse_args()\n    \n    # Load configuration\n    config_path = args.config if args.config else CONFIG_PATH\n    \n    if args.config and not os.path.exists(args.config):\n        console.print(f\"Config file not found: {args.config}\", style=\"red\")\n        return 1\n    \n    config = load_config()\n    \n    # Execute the appropriate command\n    if args.command == 'chat':\n        chat_command(args, config)\n    elif args.command == 'complete':\n        # Implementation for complete command\n        pass\n    elif args.command == 'models':\n        models_command(args, config)\n    elif args.command == 'config':\n        config_command(args, config)\n    elif args.command == 'agents':\n        agent_command(args, config)\n    elif args.command == 'session':\n        # Implementation for session command\n        pass\n    else:\n        # No command specified, show help\n        parser.print_help()\n    \n    return 0\n\nif __name__ == \"__main__\":\n    sys.exit(main())\n"])</script><script>self.__next_f.push([1,"2fe:T3b9f,"])</script><script>self.__next_f.push([1,"\n### CLI Implementation Example\n\n```python\n# mcp_cli.py\nimport argparse\nimport os\nimport json\nimport sys\nimport time\nfrom typing import Dict, Any, List, Optional\nimport requests\nimport yaml\nimport colorama\nfrom colorama import Fore, Style\nfrom prompt_toolkit import PromptSession\nfrom prompt_toolkit.history import FileHistory\nfrom prompt_toolkit.auto_suggest import AutoSuggestFromHistory\nfrom prompt_toolkit.completion import WordCompleter\nfrom rich.console import Console\nfrom rich.markdown import Markdown\nfrom rich.panel import Panel\nfrom rich.progress import Progress\n\n# Initialize colorama for cross-platform color support\ncolorama.init()\nconsole = Console()\n\nCONFIG_PATH = os.path.expanduser(\"~/.mcp/config.yaml\")\nHISTORY_PATH = os.path.expanduser(\"~/.mcp/history\")\nAPI_URL = \"http://localhost:8000/api/v1\"\n\ndef ensure_config_dir():\n    \"\"\"Ensure the config directory exists.\"\"\"\n    config_dir = os.path.dirname(CONFIG_PATH)\n    os.makedirs(config_dir, exist_ok=True)\n    os.makedirs(os.path.dirname(HISTORY_PATH), exist_ok=True)\n\ndef load_config():\n    \"\"\"Load configuration from file.\"\"\"\n    ensure_config_dir()\n    \n    if not os.path.exists(CONFIG_PATH):\n        # Create default config\n        config = {\n            \"api\": {\n                \"url\": API_URL,\n                \"key\": None\n            },\n            \"defaults\": {\n                \"model\": \"auto\",\n                \"provider\": \"auto\",\n                \"agent\": \"general\"\n            },\n            \"output\": {\n                \"format\": \"markdown\",\n                \"show_model_info\": True\n            }\n        }\n        \n        with open(CONFIG_PATH, 'w') as f:\n            yaml.dump(config, f, default_flow_style=False)\n        \n        console.print(f\"Created default config at {CONFIG_PATH}\", style=\"yellow\")\n        return config\n    \n    with open(CONFIG_PATH, 'r') as f:\n        return yaml.safe_load(f)\n\ndef save_config(config):\n    \"\"\"Save configuration to file.\"\"\"\n    with open(CONFIG_PATH, 'w') as f:\n        yaml.dump(config, f, default_flow_style=False)\n\ndef get_api_key(config):\n    \"\"\"Get API key from config or environment.\"\"\"\n    if config[\"api\"][\"key\"]:\n        return config[\"api\"][\"key\"]\n    \n    env_key = os.environ.get(\"MCP_API_KEY\")\n    if env_key:\n        return env_key\n    \n    # If no key is configured, prompt the user\n    console.print(\"No API key found. Please enter your API key:\", style=\"yellow\")\n    key = input(\"\u003e \")\n    \n    if key:\n        config[\"api\"][\"key\"] = key\n        save_config(config)\n        return key\n    \n    console.print(\"No API key provided. Some features may not work.\", style=\"red\")\n    return None\n\ndef make_api_request(endpoint, method=\"GET\", data=None, config=None):\n    \"\"\"Make an API request to the MCP backend.\"\"\"\n    if config is None:\n        config = load_config()\n    \n    api_key = get_api_key(config)\n    headers = {\n        \"Content-Type\": \"application/json\"\n    }\n    \n    if api_key:\n        headers[\"Authorization\"] = f\"Bearer {api_key}\"\n    \n    url = f\"{config['api']['url']}/{endpoint.lstrip('/')}\"\n    \n    try:\n        if method == \"GET\":\n            response = requests.get(url, headers=headers)\n        elif method == \"POST\":\n            response = requests.post(url, headers=headers, json=data)\n        else:\n            raise ValueError(f\"Unsupported HTTP method: {method}\")\n        \n        response.raise_for_status()\n        return response.json()\n    except requests.exceptions.RequestException as e:\n        console.print(f\"API request failed: {str(e)}\", style=\"red\")\n        return None\n\ndef display_response(response_text, format_type=\"markdown\"):\n    \"\"\"Display a response with appropriate formatting.\"\"\"\n    if format_type == \"markdown\":\n        console.print(Markdown(response_text))\n    else:\n        console.print(response_text)\n\ndef chat_command(args, config):\n    \"\"\"Start an interactive chat session.\"\"\"\n    session_id = args.session_id\n    model_name = args.model or config[\"defaults\"][\"model\"]\n    provider = args.provider or config[\"defaults\"][\"provider\"]\n    agent_type = args.agent or config[\"defaults\"][\"agent\"]\n    \n    console.print(Panel(f\"Starting MCP Chat Session\\nModel: {model_name} | Provider: {provider} | Agent: {agent_type}\"))\n    console.print(\"Type 'exit' to quit, 'help' for commands\", style=\"dim\")\n    \n    # Set up prompt session with history\n    ensure_config_dir()\n    history_file = os.path.join(HISTORY_PATH, \"chat_history\")\n    session = PromptSession(\n        history=FileHistory(history_file),\n        auto_suggest=AutoSuggestFromHistory(),\n        completer=WordCompleter(['exit', 'help', 'models', 'clear', 'save', 'switch'])\n    )\n    \n    # Initial session data\n    if not session_id:\n        # Create a new session\n        pass\n    \n    while True:\n        try:\n            user_input = session.prompt(f\"{Fore.GREEN}You: {Style.RESET_ALL}\")\n            \n            if user_input.lower() in ('exit', 'quit'):\n                break\n            \n            if not user_input.strip():\n                continue\n            \n            # Handle special commands\n            if user_input.lower() == 'help':\n                console.print(Panel(\"\"\"\n                Available commands:\n                - exit/quit: Exit the chat session\n                - clear: Clear the current conversation\n                - save FILENAME: Save conversation to file\n                - models: List available models\n                - switch MODEL: Switch to a different model\n                - provider PROVIDER: Switch to a different provider\n                \"\"\"))\n                continue\n            \n            # For normal input, send to API\n            with Progress() as progress:\n                task = progress.add_task(\"[cyan]Generating response...\", total=None)\n                \n                data = {\n                    \"message\": user_input,\n                    \"session_id\": session_id,\n                    \"model_params\": {\n                        \"provider\": provider,\n                        \"model\": model_name,\n                        \"auto_select\": provider == \"auto\"\n                    }\n                }\n                \n                response = make_api_request(\"chat\", method=\"POST\", data=data, config=config)\n                progress.update(task, completed=100)\n            \n            if response:\n                session_id = response[\"session_id\"]\n                model_used = response.get(\"model_used\", model_name)\n                provider_used = response.get(\"provider_used\", provider)\n                \n                # Display provider and model info if configured\n                if config[\"output\"][\"show_model_info\"]:\n                    console.print(f\"\\n{Fore.BLUE}MCP [{provider_used}:{model_used}]:{Style.RESET_ALL}\")\n                else:\n                    console.print(f\"\\n{Fore.BLUE}MCP:{Style.RESET_ALL}\")\n                \n                display_response(response[\"response\"], config[\"output\"][\"format\"])\n                console.print()  # Empty line for readability\n        \n        except KeyboardInterrupt:\n            break\n        except EOFError:\n            break\n        except Exception as e:\n            console.print(f\"Error: {str(e)}\", style=\"red\")\n    \n    console.print(\"Chat session ended\")\n\ndef models_command(args, config):\n    \"\"\"List and manage available models.\"\"\"\n    if args.pull:\n        # Pull a new model for Ollama\n        console.print(f\"Pulling Ollama model: {args.pull}\")\n        \n        with Progress() as progress:\n            task = progress.add_task(f\"[cyan]Pulling {args.pull}...\", total=None)\n            \n            # This would actually call Ollama API\n            time.sleep(2)  # Simulating download\n            \n            progress.update(task, completed=100)\n        \n        console.print(f\"Successfully pulled {args.pull}\", style=\"green\")\n        return\n    \n    # List available models\n    console.print(Panel(\"Available Models\"))\n    \n    console.print(\"\\n[bold]OpenAI Models:[/bold]\")\n    openai_models = [\n        {\"name\": \"gpt-4-turbo\", \"description\": \"Advanced reasoning, current knowledge\"},\n        {\"name\": \"gpt-3.5-turbo\", \"description\": \"Fast, efficient for standard tasks\"}\n    ]\n    \n    for model in openai_models:\n        console.print(f\"  • {model['name']} - {model['description']}\")\n    \n    console.print(\"\\n[bold]Ollama Models:[/bold]\")\n    \n    # In a real implementation, this would fetch from Ollama API\n    ollama_models = [\n        {\"name\": \"llama2\", \"description\": \"General purpose local model\", \"installed\": True},\n        {\"name\": \"mistral\", \"description\": \"Strong reasoning, 8k context window\", \"installed\": True},\n        {\"name\": \"codellama\", \"description\": \"Specialized for code generation\", \"installed\": True},\n        {\"name\": \"wizard-math\", \"description\": \"Mathematical problem-solving\", \"installed\": False}\n    ]\n    \n    for model in ollama_models:\n        status = \"[green]✓[/green]\" if model[\"installed\"] else \"[red]✗[/red]\"\n        console.print(f\"  {status} {model['name']} - {model['description']}\")\n    \n    console.print(\"\\nUse 'mcp models --pull MODEL_NAME' to download a model\")\n\ndef config_command(args, config):\n    \"\"\"View or edit configuration.\"\"\"\n    if args.set:\n        # Set a configuration value\n        key, value = args.set.split('=', 1)\n        keys = key.split('.')\n        \n        # Navigate to the nested key\n        current = config\n        for k in keys[:-1]:\n            if k not in current:\n                current[k] = {}\n            current = current[k]\n        \n        # Set the value (with type conversion)\n        if value.lower() == 'true':\n            current[keys[-1]] = True\n        elif value.lower() == 'false':\n            current[keys[-1]] = False\n        elif value.isdigit():\n            current[keys[-1]] = int(value)\n        else:\n            current[keys[-1]] = value\n        \n        save_config(config)\n        console.print(f\"Configuration updated: {key} = {value}\", style=\"green\")\n        return\n    \n    # Display current configuration\n    console.print(Panel(\"MCP Configuration\"))\n    console.print(yaml.dump(config))\n    console.print(\"\\nUse 'mcp config --set key.path=value' to change settings\")\n\ndef agent_command(args, config):\n    \"\"\"Manage agent profiles.\"\"\"\n    if args.create:\n        # Create a new agent profile\n        console.print(f\"Creating agent profile: {args.create}\")\n        # Implementation would collect agent parameters\n        return\n    \n    if args.edit:\n        # Edit an existing agent profile\n        console.print(f\"Editing agent profile: {args.edit}\")\n        return\n    \n    # List available agents\n    console.print(Panel(\"Available Agents\"))\n    \n    console.print(\"\\n[bold]System Agents:[/bold]\")\n    system_agents = [\n        {\"name\": \"general\", \"description\": \"General purpose assistant\"},\n        {\"name\": \"research\", \"description\": \"Research specialist with knowledge tools\"},\n        {\"name\": \"coding\", \"description\": \"Code assistant with tool integration\"},\n        {\"name\": \"creative\", \"description\": \"Creative writing and content generation\"}\n    ]\n    \n    for agent in system_agents:\n        console.print(f\"  • {agent['name']} - {agent['description']}\")\n    \n    # In a real implementation, this would load from user config\n    custom_agents = [\n        {\"name\": \"my-math-tutor\", \"description\": \"Mathematics teaching and problem solving\"},\n        {\"name\": \"data-analyst\", \"description\": \"Data analysis with visualization tools\"}\n    ]\n    \n    if custom_agents:\n        console.print(\"\\n[bold]Custom Agents:[/bold]\")\n        for agent in custom_agents:\n            console.print(f\"  • {agent['name']} - {agent['description']}\")\n    \n    console.print(\"\\nUse 'mcp agents --create NAME' to create a new agent\")\n\ndef main():\n    \"\"\"Main entry point for the CLI.\"\"\"\n    parser = argparse.ArgumentParser(description=\"MCP Command Line Interface\")\n    parser.add_argument('--config', help=\"Path to config file\")\n    parser.add_argument('--verbose', action='store_true', help=\"Enable verbose output\")\n    \n    subparsers = parser.add_subparsers(dest='command', help='Command to run')\n    \n    # Chat command\n    chat_parser = subparsers.add_parser('chat', help='Start a chat session')\n    chat_parser.add_argument('--model', help='Model to use')\n    chat_parser.add_argument('--provider', choices=['openai', 'ollama', 'auto'], help='Provider to use')\n    chat_parser.add_argument('--agent', help='Agent type to use')\n    chat_parser.add_argument('--session-id', help='Resume an existing session')\n    \n    # Complete command (one-shot completion)\n    complete_parser = subparsers.add_parser('complete', help='Get a completion for a prompt')\n    complete_parser.add_argument('prompt', help='Prompt text')\n    complete_parser.add_argument('--model', help='Model to use')\n    complete_parser.add_argument('--provider', choices=['openai', 'ollama', 'auto'], help='Provider to use')\n    \n    # Models command\n    models_parser = subparsers.add_parser('models', help='List and manage available models')\n    models_parser.add_argument('--pull', metavar='MODEL_NAME', help='Download a model to Ollama')\n    models_parser.add_argument('--info', metavar='MODEL_NAME', help='Show detailed model information')\n    models_parser.add_argument('--benchmark', metavar='MODEL_NAME', help='Run performance benchmark')\n    \n    # Config command\n    config_parser = subparsers.add_parser('config', help='Configure MCP settings')\n    config_parser.add_argument('--set', metavar='KEY=VALUE', help='Set a configuration value')\n    \n    # Agents command\n    agents_parser = subparsers.add_parser('agents', help='Manage agent profiles')\n    agents_parser.add_argument('--create', metavar='NAME', help='Create a new custom agent')\n    agents_parser.add_argument('--edit', metavar='NAME', help='Edit an existing agent')\n    agents_parser.add_argument('--delete', metavar='NAME', help='Delete a custom agent')\n    \n    # Session command\n    session_parser = subparsers.add_parser('session', help='Manage saved sessions')\n    session_parser.add_argument('--list', action='store_true', help='List saved sessions')\n    session_parser.add_argument('--delete', metavar='SESSION_ID', help='Delete a session')\n    session_parser.add_argument('--export', metavar='SESSION_ID', help='Export a session')\n    \n    args = parser.parse_args()\n    \n    # Load configuration\n    config_path = args.config if args.config else CONFIG_PATH\n    \n    if args.config and not os.path.exists(args.config):\n        console.print(f\"Config file not found: {args.config}\", style=\"red\")\n        return 1\n    \n    config = load_config()\n    \n    # Execute the appropriate command\n    if args.command == 'chat':\n        chat_command(args, config)\n    elif args.command == 'complete':\n        # Implementation for complete command\n        pass\n    elif args.command == 'models':\n        models_command(args, config)\n    elif args.command == 'config':\n        config_command(args, config)\n    elif args.command == 'agents':\n        agent_command(args, config)\n    elif args.command == 'session':\n        # Implementation for session command\n        pass\n    else:\n        # No command specified, show help\n        parser.print_help()\n    \n    return 0\n\nif __name__ == \"__main__\":\n    sys.exit(main())\n"])</script><script>self.__next_f.push([1,"117:[\"$\",\"pre\",\"pre-74\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$2fd\"}],\"position\":{\"start\":{\"line\":7575,\"column\":1,\"offset\":289470},\"end\":{\"line\":7989,\"column\":4,\"offset\":304730}}},\"children\":\"$2fe\"}]}]\n118:[\"$\",\"h2\",\"h2-54\",{\"id\":\"web-interface-design\",\"children\":\"Web Interface Design\"}]\n119:[\"$\",\"h3\",\"h3-40\",{\"id\":\"web-interface-architecture\",\"children\":\"Web Interface Architecture\"}]\n2ff:T123f,"])</script><script>self.__next_f.push([1,"┌────────────────────────────────────────────────────────────────────┐\n│                                                                    │\n│  React Frontend                                                    │\n│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │\n│  │ Chat         │ │ Model        │ │ Agent        │ │ Settings  │ │\n│  │ Interface    │ │ Management   │ │ Configuration│ │ Manager   │ │\n│  └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │\n│          │               │                │               │        │\n│          └───────────────┼────────────────┼───────────────┘        │\n│                          │                │                        │\n│                          ▼                ▼                        │\n│                    ┌─────────────┐  ┌────────────┐                │\n│                    │ Auth        │  │ API Client │                │\n│                    │ Management  │  │            │                │\n│                    └─────────────┘  └────────────┘                │\n│                                                                    │\n└────────────────────────────────────────────────────────────────────┘\n                              │\n                              ▼\n┌────────────────────────────────────────────────────────────────────┐\n│                                                                    │\n│  FastAPI Backend                                                   │\n│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │\n│  │ Chat         │ │ Model        │ │ Agent        │ │ User      │ │\n│  │ Controller   │ │ Controller   │ │ Controller   │ │ Controller│ │\n│  └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │\n│          │               │                │               │        │\n│          └───────────────┼────────────────┼───────────────┘        │\n│                          │                │                        │\n│                          ▼                ▼                        │\n│              ┌───────────────────┐  ┌────────────────────┐        │\n│              │ Provider Service  │  │ Agent Factory      │        │\n│              └───────────────────┘  └────────────────────┘        │\n│                       │                       │                   │\n│                       ▼                       ▼                   │\n│               ┌─────────────┐         ┌─────────────┐            │\n│               │ OpenAI API  │         │ Ollama API  │            │\n│               └─────────────┘         └─────────────┘            │\n│                                                                    │\n└────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"300:T123f,"])</script><script>self.__next_f.push([1,"┌────────────────────────────────────────────────────────────────────┐\n│                                                                    │\n│  React Frontend                                                    │\n│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │\n│  │ Chat         │ │ Model        │ │ Agent        │ │ Settings  │ │\n│  │ Interface    │ │ Management   │ │ Configuration│ │ Manager   │ │\n│  └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │\n│          │               │                │               │        │\n│          └───────────────┼────────────────┼───────────────┘        │\n│                          │                │                        │\n│                          ▼                ▼                        │\n│                    ┌─────────────┐  ┌────────────┐                │\n│                    │ Auth        │  │ API Client │                │\n│                    │ Management  │  │            │                │\n│                    └─────────────┘  └────────────┘                │\n│                                                                    │\n└────────────────────────────────────────────────────────────────────┘\n                              │\n                              ▼\n┌────────────────────────────────────────────────────────────────────┐\n│                                                                    │\n│  FastAPI Backend                                                   │\n│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │\n│  │ Chat         │ │ Model        │ │ Agent        │ │ User      │ │\n│  │ Controller   │ │ Controller   │ │ Controller   │ │ Controller│ │\n│  └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘ │\n│          │               │                │               │        │\n│          └───────────────┼────────────────┼───────────────┘        │\n│                          │                │                        │\n│                          ▼                ▼                        │\n│              ┌───────────────────┐  ┌────────────────────┐        │\n│              │ Provider Service  │  │ Agent Factory      │        │\n│              └───────────────────┘  └────────────────────┘        │\n│                       │                       │                   │\n│                       ▼                       ▼                   │\n│               ┌─────────────┐         ┌─────────────┐            │\n│               │ OpenAI API  │         │ Ollama API  │            │\n│               └─────────────┘         └─────────────┘            │\n│                                                                    │\n└────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"11a:[\"$\",\"pre\",\"pre-75\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$2ff\"}],\"position\":{\"start\":{\"line\":7995,\"column\":1,\"offset\":304789},\"end\":{\"line\":8036,\"column\":4,\"offset\":307535}}},\"children\":\"$300\"}]}]\n11b:[\"$\",\"h3\",\"h3-41\",{\"id\":\"web-interface-wireframes\",\"children\":\"Web Interface Wireframes\"}]\n11c:[\"$\",\"h4\",\"h4-15\",{\"id\":\"chat-interface\",\"children\":\"Chat Interface\"}]\n301:Tf1f,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant                                           🔄 New Chat  ⚙️  │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  ┌─────────────────────────┐  ┌───────────────────────────────────────┐ │\n│  │ Chat History            │  │                                       │ │\n│  │                         │  │ User: Tell me about quantum computing │ │\n│  │ Welcome                 │  │                                       │ │\n│  │ Quantum Computing       │  │ MCP: Quantum computing is a type of   │ │\n│  │ AI Ethics               │  │ computation that harnesses quantum    │ │\n│  │ Python Tutorial         │  │ mechanical phenomena like super-      │ │\n│  │                         │  │ position and entanglement.           │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ Unlike classical bits that represent  │ │\n│  │                         │  │ either 0 or 1, quantum bits or        │ │\n│  │                         │  │ \"qubits\" can exist in multiple states │ │\n│  │                         │  │ simultaneously due to superposition.  │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ [Response continues...]               │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ User: How does quantum entanglement   │ │\n│  │                         │  │ work?                                 │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ MCP is typing...                      │ │\n│  │                         │  │                                       │ │\n│  └─────────────────────────┘  └───────────────────────────────────────┘ │\n│                                                                         │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ Type your message...                                      Send ▶ │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Model: auto (OpenAI:gpt-4) | Mode: Research | Memory: Enabled          │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"302:Tf1f,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant                                           🔄 New Chat  ⚙️  │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  ┌─────────────────────────┐  ┌───────────────────────────────────────┐ │\n│  │ Chat History            │  │                                       │ │\n│  │                         │  │ User: Tell me about quantum computing │ │\n│  │ Welcome                 │  │                                       │ │\n│  │ Quantum Computing       │  │ MCP: Quantum computing is a type of   │ │\n│  │ AI Ethics               │  │ computation that harnesses quantum    │ │\n│  │ Python Tutorial         │  │ mechanical phenomena like super-      │ │\n│  │                         │  │ position and entanglement.           │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ Unlike classical bits that represent  │ │\n│  │                         │  │ either 0 or 1, quantum bits or        │ │\n│  │                         │  │ \"qubits\" can exist in multiple states │ │\n│  │                         │  │ simultaneously due to superposition.  │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ [Response continues...]               │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ User: How does quantum entanglement   │ │\n│  │                         │  │ work?                                 │ │\n│  │                         │  │                                       │ │\n│  │                         │  │ MCP is typing...                      │ │\n│  │                         │  │                                       │ │\n│  └─────────────────────────┘  └───────────────────────────────────────┘ │\n│                                                                         │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ Type your message...                                      Send ▶ │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Model: auto (OpenAI:gpt-4) | Mode: Research | Memory: Enabled          │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"11d:[\"$\",\"pre\",\"pre-76\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$301\"}],\"position\":{\"start\":{\"line\":8042,\"column\":1,\"offset\":307588},\"end\":{\"line\":8077,\"column\":4,\"offset\":310180}}},\"children\":\"$302\"}]}]\n11e:[\"$\",\"h4\",\"h4-16\",{\"id\":\"model-settings-panel\",\"children\":\"Model Settings Panel\"}]\n303:T10e0,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant \u003e Settings \u003e Models                                   ✖    │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  Model Selection                                                        │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ● Auto-select model (recommended)                               │    │\n│  │ ○ Specify model and provider                                    │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Provider                     Model                                     │\n│  ┌────────────┐               ┌────────────────────┐                    │\n│  │ OpenAI   ▼ │               │ gpt-4-turbo      ▼ │                    │\n│  └────────────┘               └────────────────────┘                    │\n│                                                                         │\n│  Auto-Selection Preferences                                             │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ Prioritize:  ● Speed   ○ Quality   ○ Privacy   ○ Cost           │    │\n│  │                                                                  │    │\n│  │ Complexity threshold: ███████████░░░░░░░░░  0.65                 │    │\n│  │                                                                  │    │\n│  │ [✓] Prefer Ollama for privacy-sensitive content                  │    │\n│  │ [✓] Use OpenAI for complex reasoning                            │    │\n│  │ [✓] Automatically fall back if a provider fails                  │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Available Ollama Models                                                │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ✓ llama2         ✓ mistral        ✓ codellama                   │    │\n│  │ ✓ wizard-math    ✓ neural-chat    ○ llama2:70b  [Download]      │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  [ Save Changes ]         [ Cancel ]                                    │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"304:T10e0,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant \u003e Settings \u003e Models                                   ✖    │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  Model Selection                                                        │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ● Auto-select model (recommended)                               │    │\n│  │ ○ Specify model and provider                                    │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Provider                     Model                                     │\n│  ┌────────────┐               ┌────────────────────┐                    │\n│  │ OpenAI   ▼ │               │ gpt-4-turbo      ▼ │                    │\n│  └────────────┘               └────────────────────┘                    │\n│                                                                         │\n│  Auto-Selection Preferences                                             │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ Prioritize:  ● Speed   ○ Quality   ○ Privacy   ○ Cost           │    │\n│  │                                                                  │    │\n│  │ Complexity threshold: ███████████░░░░░░░░░  0.65                 │    │\n│  │                                                                  │    │\n│  │ [✓] Prefer Ollama for privacy-sensitive content                  │    │\n│  │ [✓] Use OpenAI for complex reasoning                            │    │\n│  │ [✓] Automatically fall back if a provider fails                  │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Available Ollama Models                                                │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ✓ llama2         ✓ mistral        ✓ codellama                   │    │\n│  │ ✓ wizard-math    ✓ neural-chat    ○ llama2:70b  [Download]      │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  [ Save Changes ]         [ Cancel ]                                    │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"11f:[\"$\",\"pre\",\"pre-77\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$303\"}],\"position\":{\"start\":{\"line\":8081,\"column\":1,\"offset\":310209},\"end\":{\"line\":8117,\"column\":4,\"offset\":312882}}},\"children\":\"$304\"}]}]\n120:[\"$\",\"h4\",\"h4-17\",{\"id\":\"agent-configuration-panel\",\"children\":\"Agent Configuration Panel\"}]\n305:T101f,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant \u003e Settings \u003e Agents                                   ✖    │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  Current Agent: Research Assistant                             [Edit ✏] │\n│                                                                         │\n│  Agent Library                                                          │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ● Research Assistant    Knowledge-focused with search capability│    │\n│  │ ○ Code Assistant        Specialized for software development    │    │\n│  │ ○ Creative Writer       Content creation and storytelling       │    │\n│  │ ○ Math Tutor            Step-by-step problem solving            │    │\n│  │ ○ General Assistant     Versatile helper for everyday tasks     │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Agent Capabilities                                                     │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ [✓] Knowledge retrieval      [ ] Code execution                  │    │\n│  │ [✓] Web search              [ ] Data visualization              │    │\n│  │ [✓] Memory                  [ ] File operations                 │    │\n│  │ [✓] Calendar awareness      [ ] Email integration               │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  System Instructions                                                    │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ You are a research assistant with expertise in finding and       │    │\n│  │ synthesizing information. Provide comprehensive, accurate        │    │\n│  │ answers with authoritative sources when available.               │    │\n│  │                                                                  │    │\n│  │                                                                  │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  [ Save Agent ]   [ Create New Agent ]   [ Import ]   [ Export ]        │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"306:T101f,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant \u003e Settings \u003e Agents                                   ✖    │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  Current Agent: Research Assistant                             [Edit ✏] │\n│                                                                         │\n│  Agent Library                                                          │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ● Research Assistant    Knowledge-focused with search capability│    │\n│  │ ○ Code Assistant        Specialized for software development    │    │\n│  │ ○ Creative Writer       Content creation and storytelling       │    │\n│  │ ○ Math Tutor            Step-by-step problem solving            │    │\n│  │ ○ General Assistant     Versatile helper for everyday tasks     │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Agent Capabilities                                                     │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ [✓] Knowledge retrieval      [ ] Code execution                  │    │\n│  │ [✓] Web search              [ ] Data visualization              │    │\n│  │ [✓] Memory                  [ ] File operations                 │    │\n│  │ [✓] Calendar awareness      [ ] Email integration               │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  System Instructions                                                    │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ You are a research assistant with expertise in finding and       │    │\n│  │ synthesizing information. Provide comprehensive, accurate        │    │\n│  │ answers with authoritative sources when available.               │    │\n│  │                                                                  │    │\n│  │                                                                  │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  [ Save Agent ]   [ Create New Agent ]   [ Import ]   [ Export ]        │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"121:[\"$\",\"pre\",\"pre-78\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$305\"}],\"position\":{\"start\":{\"line\":8121,\"column\":1,\"offset\":312916},\"end\":{\"line\":8157,\"column\":4,\"offset\":315590}}},\"children\":\"$306\"}]}]\n122:[\"$\",\"h4\",\"h4-18\",{\"id\":\"dashboard-view\",\"children\":\"Dashboard View\"}]\n307:T11af,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant \u003e Dashboard                                        ⚙️      │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  System Status                                   Last 24 Hours          │\n│  ┌────────────────────────────┐   ┌────────────────────────────────┐    │\n│  │ OpenAI: ● Connected        │   │ Requests: 143                  │    │\n│  │ Ollama:  ● Connected       │   │ OpenAI: 62% | Ollama: 38%      │    │\n│  │ Database: ● Operational    │   │ Avg Response Time: 2.4s        │    │\n│  └────────────────────────────┘   └────────────────────────────────┘    │\n│                                                                         │\n│  Recent Conversations                                                   │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ● Quantum Computing Research       Today, 14:32   [Resume]      │    │\n│  │ ● Python Code Debugging           Today, 10:15   [Resume]      │    │\n│  │ ● Travel Planning                  Yesterday      [Resume]      │    │\n│  │ ● Financial Analysis               2 days ago     [Resume]      │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Model Usage                          Agent Usage                       │\n│  ┌────────────────────────────┐   ┌────────────────────────────────┐    │\n│  │ ███ OpenAI:gpt-4      27%  │   │ ███ Research Assistant    42%  │    │\n│  │ ███ OpenAI:gpt-3.5    35%  │   │ ███ Code Assistant       31%  │    │\n│  │ ███ Ollama:mistral    20%  │   │ ███ General Assistant    18%  │    │\n│  │ ███ Ollama:llama2     18%  │   │ ███ Other                 9%  │    │\n│  └────────────────────────────┘   └────────────────────────────────┘    │\n│                                                                         │\n│  API Credits                                                            │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ OpenAI: $4.32 used this month of $10.00 budget  ████░░░░░ 43%   │    │\n│  │ Estimated savings from Ollama usage: $3.87                      │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  [ New Chat ]   [ View All Conversations ]   [ System Settings ]        │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"308:T11af,"])</script><script>self.__next_f.push([1,"┌─────────────────────────────────────────────────────────────────────────┐\n│ MCP Assistant \u003e Dashboard                                        ⚙️      │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  System Status                                   Last 24 Hours          │\n│  ┌────────────────────────────┐   ┌────────────────────────────────┐    │\n│  │ OpenAI: ● Connected        │   │ Requests: 143                  │    │\n│  │ Ollama:  ● Connected       │   │ OpenAI: 62% | Ollama: 38%      │    │\n│  │ Database: ● Operational    │   │ Avg Response Time: 2.4s        │    │\n│  └────────────────────────────┘   └────────────────────────────────┘    │\n│                                                                         │\n│  Recent Conversations                                                   │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ ● Quantum Computing Research       Today, 14:32   [Resume]      │    │\n│  │ ● Python Code Debugging           Today, 10:15   [Resume]      │    │\n│  │ ● Travel Planning                  Yesterday      [Resume]      │    │\n│  │ ● Financial Analysis               2 days ago     [Resume]      │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  Model Usage                          Agent Usage                       │\n│  ┌────────────────────────────┐   ┌────────────────────────────────┐    │\n│  │ ███ OpenAI:gpt-4      27%  │   │ ███ Research Assistant    42%  │    │\n│  │ ███ OpenAI:gpt-3.5    35%  │   │ ███ Code Assistant       31%  │    │\n│  │ ███ Ollama:mistral    20%  │   │ ███ General Assistant    18%  │    │\n│  │ ███ Ollama:llama2     18%  │   │ ███ Other                 9%  │    │\n│  └────────────────────────────┘   └────────────────────────────────┘    │\n│                                                                         │\n│  API Credits                                                            │\n│  ┌─────────────────────────────────────────────────────────────────┐    │\n│  │ OpenAI: $4.32 used this month of $10.00 budget  ████░░░░░ 43%   │    │\n│  │ Estimated savings from Ollama usage: $3.87                      │    │\n│  └─────────────────────────────────────────────────────────────────┘    │\n│                                                                         │\n│  [ New Chat ]   [ View All Conversations ]   [ System Settings ]        │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n"])</script><script>self.__next_f.push([1,"123:[\"$\",\"pre\",\"pre-79\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$307\"}],\"position\":{\"start\":{\"line\":8161,\"column\":1,\"offset\":315613},\"end\":{\"line\":8198,\"column\":4,\"offset\":318353}}},\"children\":\"$308\"}]}]\n124:[\"$\",\"h3\",\"h3-42\",{\"id\":\"web-interface-interaction-flow\",\"children\":\"Web Interface Interaction Flow\"}]\n309:Ta06,"])</script><script>self.__next_f.push([1,"┌──────────────┐     ┌───────────────┐     ┌────────────────┐\n│              │     │               │     │                │\n│  Login Page  │────▶│  Dashboard    │────▶│  Chat Interface│◀───┐\n│              │     │               │     │                │    │\n└──────────────┘     └───────┬───────┘     └────────┬───────┘    │\n                             │                      │            │\n                             ▼                      ▼            │\n                     ┌───────────────┐     ┌────────────────┐    │\n                     │               │     │                │    │\n                     │Settings Panel │     │ User Message   │    │\n                     │               │     │                │    │\n                     └───┬───────────┘     └────────┬───────┘    │\n                         │                          │            │\n                         ▼                          ▼            │\n                ┌────────────────┐         ┌────────────────┐    │\n                │                │         │                │    │\n                │Model Settings  │         │API Processing  │    │\n                │                │         │                │    │\n                └────────┬───────┘         └────────┬───────┘    │\n                         │                          │            │\n                         ▼                          ▼            │\n                ┌────────────────┐         ┌────────────────┐    │\n                │                │         │                │    │\n                │Agent Settings  │         │System Response │────┘\n                │                │         │                │\n                └────────────────┘         └────────────────┘\n"])</script><script>self.__next_f.push([1,"30a:Ta06,"])</script><script>self.__next_f.push([1,"┌──────────────┐     ┌───────────────┐     ┌────────────────┐\n│              │     │               │     │                │\n│  Login Page  │────▶│  Dashboard    │────▶│  Chat Interface│◀───┐\n│              │     │               │     │                │    │\n└──────────────┘     └───────┬───────┘     └────────┬───────┘    │\n                             │                      │            │\n                             ▼                      ▼            │\n                     ┌───────────────┐     ┌────────────────┐    │\n                     │               │     │                │    │\n                     │Settings Panel │     │ User Message   │    │\n                     │               │     │                │    │\n                     └───┬───────────┘     └────────┬───────┘    │\n                         │                          │            │\n                         ▼                          ▼            │\n                ┌────────────────┐         ┌────────────────┐    │\n                │                │         │                │    │\n                │Model Settings  │         │API Processing  │    │\n                │                │         │                │    │\n                └────────┬───────┘         └────────┬───────┘    │\n                         │                          │            │\n                         ▼                          ▼            │\n                ┌────────────────┐         ┌────────────────┐    │\n                │                │         │                │    │\n                │Agent Settings  │         │System Response │────┘\n                │                │         │                │\n                └────────────────┘         └────────────────┘\n"])</script><script>self.__next_f.push([1,"125:[\"$\",\"pre\",\"pre-80\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$309\"}],\"position\":{\"start\":{\"line\":8202,\"column\":1,\"offset\":318391},\"end\":{\"line\":8229,\"column\":4,\"offset\":320120}}},\"children\":\"$30a\"}]}]\n126:[\"$\",\"h3\",\"h3-43\",{\"id\":\"key-web-components\",\"children\":\"Key Web Components\"}]\n127:[\"$\",\"h4\",\"h4-19\",{\"id\":\"providerselector-component\",\"children\":\"ProviderSelector Component\"}]\n30b:T16c8,"])</script><script>self.__next_f.push([1,"// ProviderSelector.jsx\nimport React, { useState, useEffect } from 'react';\nimport { Dropdown, Switch, Slider, Checkbox, Button, Card, Alert } from 'antd';\nimport { ApiOutlined, SettingOutlined, QuestionCircleOutlined } from '@ant-design/icons';\n\nconst ProviderSelector = ({ \n  onProviderChange, \n  onModelChange,\n  initialProvider = 'auto',\n  initialModel = null,\n  showAdvanced = false\n}) =\u003e {\n  const [provider, setProvider] = useState(initialProvider);\n  const [model, setModel] = useState(initialModel);\n  const [autoSelect, setAutoSelect] = useState(initialProvider === 'auto');\n  const [complexityThreshold, setComplexityThreshold] = useState(0.65);\n  const [prioritizePrivacy, setPrioritizePrivacy] = useState(false);\n  const [ollamaModels, setOllamaModels] = useState([]);\n  const [ollamaStatus, setOllamaStatus] = useState('unknown'); // 'online', 'offline', 'unknown'\n  const [openaiModels, setOpenaiModels] = useState([\n    { value: 'gpt-4o', label: 'GPT-4o' },\n    { value: 'gpt-4-turbo', label: 'GPT-4 Turbo' },\n    { value: 'gpt-3.5-turbo', label: 'GPT-3.5 Turbo' }\n  ]);\n  \n  // Fetch available Ollama models on component mount\n  useEffect(() =\u003e {\n    const fetchOllamaModels = async () =\u003e {\n      try {\n        const response = await fetch('/api/v1/models/ollama');\n        if (response.ok) {\n          const data = await response.json();\n          setOllamaModels(data.models.map(m =\u003e ({ \n            value: m.name, \n            label: m.name \n          })));\n          setOllamaStatus('online');\n        } else {\n          setOllamaStatus('offline');\n        }\n      } catch (error) {\n        console.error('Error fetching Ollama models:', error);\n        setOllamaStatus('offline');\n      }\n    };\n    \n    fetchOllamaModels();\n  }, []);\n  \n  const handleProviderChange = (value) =\u003e {\n    setProvider(value);\n    onProviderChange(value);\n    \n    // Reset model when changing provider\n    setModel(null);\n    onModelChange(null);\n  };\n  \n  const handleModelChange = (value) =\u003e {\n    setModel(value);\n    onModelChange(value);\n  };\n  \n  const handleAutoSelectChange = (checked) =\u003e {\n    setAutoSelect(checked);\n    if (checked) {\n      setProvider('auto');\n      onProviderChange('auto');\n      setModel(null);\n      onModelChange(null);\n    } else {\n      // Default to OpenAI if disabling auto-select\n      setProvider('openai');\n      onProviderChange('openai');\n      setModel('gpt-3.5-turbo');\n      onModelChange('gpt-3.5-turbo');\n    }\n  };\n  \n  const providerOptions = [\n    { value: 'openai', label: 'OpenAI' },\n    { value: 'ollama', label: 'Ollama (Local)' },\n    { value: 'auto', label: 'Auto-select' }\n  ];\n  \n  return (\n    \u003cCard title=\"Model Selection\" extra={\u003cQuestionCircleOutlined /\u003e}\u003e\n      \u003cdiv className=\"provider-selector\"\u003e\n        \u003cdiv className=\"selector-row\"\u003e\n          \u003cSwitch \n            checked={autoSelect} \n            onChange={handleAutoSelectChange}\n            checkedChildren=\"Auto-select\"\n            unCheckedChildren=\"Manual\" \n          /\u003e\n          \u003cspan className=\"selector-label\"\u003e\n            {autoSelect ? 'Automatically select the best model for each query' : 'Manually choose provider and model'}\n          \u003c/span\u003e\n        \u003c/div\u003e\n        \n        {!autoSelect \u0026\u0026 (\n          \u003cdiv className=\"selector-row model-selection\"\u003e\n            \u003cdiv className=\"provider-dropdown\"\u003e\n              \u003cspan\u003eProvider:\u003c/span\u003e\n              \u003cDropdown\n                options={providerOptions}\n                value={provider}\n                onChange={handleProviderChange}\n                disabled={autoSelect}\n              /\u003e\n            \u003c/div\u003e\n            \n            \u003cdiv className=\"model-dropdown\"\u003e\n              \u003cspan\u003eModel:\u003c/span\u003e\n              \u003cDropdown\n                options={provider === 'openai' ? openaiModels : ollamaModels}\n                value={model}\n                onChange={handleModelChange}\n                disabled={autoSelect}\n                placeholder=\"Select a model\"\n              /\u003e\n            \u003c/div\u003e\n          \u003c/div\u003e\n        )}\n        \n        {provider === 'ollama' \u0026\u0026 ollamaStatus === 'offline' \u0026\u0026 (\n          \u003cAlert\n            message=\"Ollama is currently offline\"\n            description=\"Please start Ollama service to use local models.\"\n            type=\"warning\"\n            showIcon\n          /\u003e\n        )}\n        \n        {showAdvanced \u0026\u0026 (\n          \u003cdiv className=\"advanced-settings\"\u003e\n            \u003cdiv className=\"setting-header\"\u003eAdvanced Routing Settings\u003c/div\u003e\n            \n            \u003cdiv className=\"setting-row\"\u003e\n              \u003cspan\u003eComplexity threshold:\u003c/span\u003e\n              \u003cSlider\n                value={complexityThreshold}\n                onChange={setComplexityThreshold}\n                min={0}\n                max={1}\n                step={0.05}\n                disabled={!autoSelect}\n              /\u003e\n              \u003cspan className=\"setting-value\"\u003e{complexityThreshold}\u003c/span\u003e\n            \u003c/div\u003e\n            \n            \u003cdiv className=\"setting-row\"\u003e\n              \u003cCheckbox\n                checked={prioritizePrivacy}\n                onChange={e =\u003e setPrioritizePrivacy(e.target.checked)}\n                disabled={!autoSelect}\n              \u003e\n                Prioritize privacy (prefer Ollama for sensitive content)\n              \u003c/Checkbox\u003e\n            \u003c/div\u003e\n            \n            \u003cdiv className=\"model-status\"\u003e\n              \u003cdiv\u003e\n                \u003cApiOutlined /\u003e OpenAI: \u003cspan className=\"status-online\"\u003eConnected\u003c/span\u003e\n              \u003c/div\u003e\n              \u003cdiv\u003e\n                \u003cApiOutlined /\u003e Ollama: \u003cspan className={ollamaStatus === 'online' ? 'status-online' : 'status-offline'}\u003e\n                  {ollamaStatus === 'online' ? 'Connected' : 'Disconnected'}\n                \u003c/span\u003e\n              \u003c/div\u003e\n            \u003c/div\u003e\n          \u003c/div\u003e\n        )}\n      \u003c/div\u003e\n    \u003c/Card\u003e\n  );\n};\n\nexport default ProviderSelector;\n"])</script><script>self.__next_f.push([1,"128:[\"$\",\"pre\",\"pre-81\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-jsx\",\"children\":\"$30b\"}]}]\n129:[\"$\",\"h4\",\"h4-20\",{\"id\":\"chatinterface-component\",\"children\":\"ChatInterface Component\"}]\n30c:T20d1,"])</script><script>self.__next_f.push([1,"// ChatInterface.jsx\nimport React, { useState, useEffect, useRef } from 'react';\nimport { Input, Button, Spin, Avatar, Tooltip, Card, Typography, Dropdown, Menu } from 'antd';\nimport { SendOutlined, UserOutlined, RobotOutlined, SettingOutlined, \n         SaveOutlined, CopyOutlined, DeleteOutlined, InfoCircleOutlined } from '@ant-design/icons';\nimport ReactMarkdown from 'react-markdown';\nimport { Prism as SyntaxHighlighter } from 'react-syntax-highlighter';\nimport { tomorrow } from 'react-syntax-highlighter/dist/esm/styles/prism';\nimport ProviderSelector from './ProviderSelector';\n\nconst { TextArea } = Input;\nconst { Text, Title } = Typography;\n\nconst ChatInterface = () =\u003e {\n  const [messages, setMessages] = useState([]);\n  const [input, setInput] = useState('');\n  const [loading, setLoading] = useState(false);\n  const [sessionId, setSessionId] = useState(null);\n  const [provider, setProvider] = useState('auto');\n  const [model, setModel] = useState(null);\n  const [showSettings, setShowSettings] = useState(false);\n  const messagesEndRef = useRef(null);\n  \n  // Scroll to bottom when messages change\n  useEffect(() =\u003e {\n    scrollToBottom();\n  }, [messages]);\n  \n  const scrollToBottom = () =\u003e {\n    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });\n  };\n  \n  const handleSend = async () =\u003e {\n    if (!input.trim()) return;\n    \n    // Add user message to chat\n    const userMessage = { role: 'user', content: input, timestamp: new Date() };\n    setMessages(prev =\u003e [...prev, userMessage]);\n    setInput('');\n    setLoading(true);\n    \n    try {\n      const response = await fetch('/api/v1/chat', {\n        method: 'POST',\n        headers: { 'Content-Type': 'application/json' },\n        body: JSON.stringify({\n          message: input,\n          session_id: sessionId,\n          model_params: {\n            provider: provider,\n            model: model,\n            auto_select: provider === 'auto'\n          }\n        })\n      });\n      \n      if (!response.ok) {\n        throw new Error('Failed to get response');\n      }\n      \n      const data = await response.json();\n      \n      // Update session ID if new\n      if (data.session_id \u0026\u0026 !sessionId) {\n        setSessionId(data.session_id);\n      }\n      \n      // Add assistant message to chat\n      const assistantMessage = { \n        role: 'assistant', \n        content: data.response, \n        timestamp: new Date(),\n        metadata: {\n          model_used: data.model_used,\n          provider_used: data.provider_used\n        }\n      };\n      \n      setMessages(prev =\u003e [...prev, assistantMessage]);\n      \n    } catch (error) {\n      console.error('Error sending message:', error);\n      // Add error message\n      setMessages(prev =\u003e [...prev, { \n        role: 'system', \n        content: 'Error: Unable to get a response. Please try again.',\n        error: true,\n        timestamp: new Date()\n      }]);\n    } finally {\n      setLoading(false);\n    }\n  };\n  \n  const handleKeyDown = (e) =\u003e {\n    if (e.key === 'Enter' \u0026\u0026 !e.shiftKey) {\n      e.preventDefault();\n      handleSend();\n    }\n  };\n  \n  const handleCopyMessage = (content) =\u003e {\n    navigator.clipboard.writeText(content);\n    // Could show a toast notification here\n  };\n  \n  const renderMessage = (message, index) =\u003e {\n    const isUser = message.role === 'user';\n    const isError = message.error;\n    \n    return (\n      \u003cdiv \n        key={index} \n        className={`message-container ${isUser ? 'user-message' : 'assistant-message'} ${isError ? 'error-message' : ''}`}\n      \u003e\n        \u003cdiv className=\"message-avatar\"\u003e\n          \u003cAvatar \n            icon={isUser ? \u003cUserOutlined /\u003e : \u003cRobotOutlined /\u003e} \n            style={{ backgroundColor: isUser ? '#1890ff' : '#52c41a' }}\n          /\u003e\n        \u003c/div\u003e\n        \n        \u003cdiv className=\"message-content\"\u003e\n          \u003cdiv className=\"message-header\"\u003e\n            \u003cText strong\u003e{isUser ? 'You' : 'MCP Assistant'}\u003c/Text\u003e\n            {message.metadata \u0026\u0026 (\n              \u003cTooltip title=\"Model information\"\u003e\n                \u003cText type=\"secondary\" className=\"model-info\"\u003e\n                  \u003cInfoCircleOutlined /\u003e {message.metadata.provider_used}:{message.metadata.model_used}\n                \u003c/Text\u003e\n              \u003c/Tooltip\u003e\n            )}\n            \u003cText type=\"secondary\" className=\"message-time\"\u003e\n              {message.timestamp.toLocaleTimeString()}\n            \u003c/Text\u003e\n          \u003c/div\u003e\n          \n          \u003cdiv className=\"message-body\"\u003e\n            \u003cReactMarkdown\n              children={message.content}\n              components={{\n                code({node, inline, className, children, ...props}) {\n                  const match = /language-(\\w+)/.exec(className || '');\n                  return !inline \u0026\u0026 match ? (\n                    \u003cSyntaxHighlighter\n                      children={String(children).replace(/\\n$/, '')}\n                      style={tomorrow}\n                      language={match[1]}\n                      PreTag=\"div\"\n                      {...props}\n                    /\u003e\n                  ) : (\n                    \u003ccode className={className} {...props}\u003e\n                      {children}\n                    \u003c/code\u003e\n                  );\n                }\n              }}\n            /\u003e\n          \u003c/div\u003e\n          \n          \u003cdiv className=\"message-actions\"\u003e\n            \u003cButton \n              type=\"text\" \n              size=\"small\" \n              icon={\u003cCopyOutlined /\u003e} \n              onClick={() =\u003e handleCopyMessage(message.content)}\n            \u003e\n              Copy\n            \u003c/Button\u003e\n          \u003c/div\u003e\n        \u003c/div\u003e\n      \u003c/div\u003e\n    );\n  };\n  \n  const settingsMenu = (\n    \u003cCard className=\"settings-panel\"\u003e\n      \u003cTitle level={4}\u003eChat Settings\u003c/Title\u003e\n      \n      \u003cProviderSelector \n        onProviderChange={setProvider}\n        onModelChange={setModel}\n        initialProvider={provider}\n        initialModel={model}\n        showAdvanced={true}\n      /\u003e\n      \n      \u003cdiv className=\"settings-actions\"\u003e\n        \u003cButton type=\"primary\" onClick={() =\u003e setShowSettings(false)}\u003e\n          Close Settings\n        \u003c/Button\u003e\n      \u003c/div\u003e\n    \u003c/Card\u003e\n  );\n  \n  return (\n    \u003cdiv className=\"chat-interface\"\u003e\n      \u003cdiv className=\"chat-header\"\u003e\n        \u003cTitle level={3}\u003eMCP Assistant\u003c/Title\u003e\n        \n        \u003cdiv className=\"header-actions\"\u003e\n          \u003cButton icon={\u003cDeleteOutlined /\u003e} onClick={() =\u003e setMessages([])}\u003e\n            Clear Chat\n          \u003c/Button\u003e\n          \u003cButton \n            icon={\u003cSettingOutlined /\u003e} \n            type={showSettings ? 'primary' : 'default'}\n            onClick={() =\u003e setShowSettings(!showSettings)}\n          \u003e\n            Settings\n          \u003c/Button\u003e\n        \u003c/div\u003e\n      \u003c/div\u003e\n      \n      {showSettings \u0026\u0026 settingsMenu}\n      \n      \u003cdiv className=\"message-list\"\u003e\n        {messages.length === 0 \u0026\u0026 (\n          \u003cdiv className=\"empty-state\"\u003e\n            \u003cTitle level={4}\u003eStart a conversation\u003c/Title\u003e\n            \u003cText\u003eAsk a question or request information\u003c/Text\u003e\n          \u003c/div\u003e\n        )}\n        \n        {messages.map(renderMessage)}\n        \n        {loading \u0026\u0026 (\n          \u003cdiv className=\"message-container assistant-message\"\u003e\n            \u003cdiv className=\"message-avatar\"\u003e\n              \u003cAvatar icon={\u003cRobotOutlined /\u003e} style={{ backgroundColor: '#52c41a' }} /\u003e\n            \u003c/div\u003e\n            \u003cdiv className=\"message-content\"\u003e\n              \u003cdiv className=\"message-body typing-indicator\"\u003e\n                \u003cSpin /\u003e MCP is thinking...\n              \u003c/div\u003e\n            \u003c/div\u003e\n          \u003c/div\u003e\n        )}\n        \n        \u003cdiv ref={messagesEndRef} /\u003e\n      \u003c/div\u003e\n      \n      \u003cdiv className=\"chat-input\"\u003e\n        \u003cTextArea\n          value={input}\n          onChange={e =\u003e setInput(e.target.value)}\n          onKeyDown={handleKeyDown}\n          placeholder=\"Type your message...\"\n          autoSize={{ minRows: 1, maxRows: 4 }}\n          disabled={loading}\n        /\u003e\n        \u003cButton \n          type=\"primary\" \n          icon={\u003cSendOutlined /\u003e} \n          onClick={handleSend}\n          disabled={loading || !input.trim()}\n        \u003e\n          Send\n        \u003c/Button\u003e\n      \u003c/div\u003e\n      \n      \u003cdiv className=\"chat-footer\"\u003e\n        \u003cText type=\"secondary\"\u003e\n          Model: {provider === 'auto' ? 'Auto-select' : `${provider}:${model || 'default'}`}\n        \u003c/Text\u003e\n        {sessionId \u0026\u0026 (\n          \u003cText type=\"secondary\"\u003eSession ID: {sessionId}\u003c/Text\u003e\n        )}\n      \u003c/div\u003e\n    \u003c/div\u003e\n  );\n};\n\nexport default ChatInterface;\n"])</script><script>self.__next_f.push([1,"12a:[\"$\",\"pre\",\"pre-82\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-jsx\",\"children\":\"$30c\"}]}]\n12b:[\"$\",\"h4\",\"h4-21\",{\"id\":\"agentconfiguration-component\",\"children\":\"AgentConfiguration Component\"}]\n30d:T2112,"])</script><script>self.__next_f.push([1,"// AgentConfiguration.jsx\nimport React, { useState, useEffect } from 'react';\nimport { Form, Input, Button, Select, Checkbox, Card, Typography, Tabs, message } from 'antd';\nimport { SaveOutlined, PlusOutlined, ImportOutlined, ExportOutlined } from '@ant-design/icons';\n\nconst { Title, Text } = Typography;\nconst { TextArea } = Input;\nconst { Option } = Select;\nconst { TabPane } = Tabs;\n\nconst AgentConfiguration = () =\u003e {\n  const [form] = Form.useForm();\n  const [agents, setAgents] = useState([]);\n  const [currentAgent, setCurrentAgent] = useState(null);\n  const [loading, setLoading] = useState(false);\n  \n  // Fetch available agents on component mount\n  useEffect(() =\u003e {\n    const fetchAgents = async () =\u003e {\n      setLoading(true);\n      try {\n        const response = await fetch('/api/v1/agents');\n        if (response.ok) {\n          const data = await response.json();\n          setAgents(data.agents);\n          \n          // Set current agent to the first one\n          if (data.agents.length \u003e 0) {\n            setCurrentAgent(data.agents[0]);\n            form.setFieldsValue(data.agents[0]);\n          }\n        }\n      } catch (error) {\n        console.error('Error fetching agents:', error);\n        message.error('Failed to load agents');\n      } finally {\n        setLoading(false);\n      }\n    };\n    \n    fetchAgents();\n  }, [form]);\n  \n  const handleAgentChange = (agentId) =\u003e {\n    const selected = agents.find(a =\u003e a.id === agentId);\n    if (selected) {\n      setCurrentAgent(selected);\n      form.setFieldsValue(selected);\n    }\n  };\n  \n  const handleSaveAgent = async (values) =\u003e {\n    setLoading(true);\n    try {\n      const response = await fetch(`/api/v1/agents/${currentAgent.id}`, {\n        method: 'PUT',\n        headers: { 'Content-Type': 'application/json' },\n        body: JSON.stringify(values)\n      });\n      \n      if (response.ok) {\n        message.success('Agent configuration saved');\n        // Update local state\n        const updatedAgents = agents.map(a =\u003e \n          a.id === currentAgent.id ? { ...a, ...values } : a\n        );\n        setAgents(updatedAgents);\n        setCurrentAgent({ ...currentAgent, ...values });\n      } else {\n        message.error('Failed to save agent configuration');\n      }\n    } catch (error) {\n      console.error('Error saving agent:', error);\n      message.error('Error saving agent configuration');\n    } finally {\n      setLoading(false);\n    }\n  };\n  \n  const handleCreateAgent = () =\u003e {\n    form.resetFields();\n    form.setFieldsValue({\n      name: 'New Agent',\n      description: 'Custom assistant',\n      capabilities: [],\n      system_prompt: 'You are a helpful assistant.'\n    });\n    \n    setCurrentAgent(null); // Indicates we're creating a new agent\n  };\n  \n  const handleExportAgent = () =\u003e {\n    if (!currentAgent) return;\n    \n    const agentData = JSON.stringify(currentAgent, null, 2);\n    const blob = new Blob([agentData], { type: 'application/json' });\n    const url = URL.createObjectURL(blob);\n    \n    const a = document.createElement('a');\n    a.href = url;\n    a.download = `${currentAgent.name.replace(/\\s+/g, '_').toLowerCase()}_agent.json`;\n    document.body.appendChild(a);\n    a.click();\n    document.body.removeChild(a);\n    URL.revokeObjectURL(url);\n  };\n  \n  return (\n    \u003cdiv className=\"agent-configuration\"\u003e\n      \u003cCard title={\u003cTitle level={4}\u003eAgent Configuration\u003c/Title\u003e}\u003e\n        \u003cdiv className=\"agent-actions\"\u003e\n          \u003cButton \n            type=\"primary\" \n            icon={\u003cPlusOutlined /\u003e} \n            onClick={handleCreateAgent}\n          \u003e\n            Create New Agent\n          \u003c/Button\u003e\n          \n          \u003cButton \n            icon={\u003cExportOutlined /\u003e} \n            onClick={handleExportAgent}\n            disabled={!currentAgent}\n          \u003e\n            Export\n          \u003c/Button\u003e\n          \n          \u003cButton icon={\u003cImportOutlined /\u003e}\u003e\n            Import\n          \u003c/Button\u003e\n        \u003c/div\u003e\n        \n        \u003cdiv className=\"agent-selector\"\u003e\n          \u003cText strong\u003eSelect Agent:\u003c/Text\u003e\n          \u003cSelect\n            style={{ width: 300 }}\n            onChange={handleAgentChange}\n            value={currentAgent?.id}\n            loading={loading}\n          \u003e\n            {agents.map(agent =\u003e (\n              \u003cOption key={agent.id} value={agent.id}\u003e\n                {agent.name} - {agent.description}\n              \u003c/Option\u003e\n            ))}\n          \u003c/Select\u003e\n        \u003c/div\u003e\n        \n        \u003cForm\n          form={form}\n          layout=\"vertical\"\n          onFinish={handleSaveAgent}\n          className=\"agent-form\"\n        \u003e\n          \u003cTabs defaultActiveKey=\"basic\"\u003e\n            \u003cTabPane tab=\"Basic Information\" key=\"basic\"\u003e\n              \u003cForm.Item\n                name=\"name\"\n                label=\"Agent Name\"\n                rules={[{ required: true, message: 'Please enter an agent name' }]}\n              \u003e\n                \u003cInput placeholder=\"Agent name\" /\u003e\n              \u003c/Form.Item\u003e\n              \n              \u003cForm.Item\n                name=\"description\"\n                label=\"Description\"\n                rules={[{ required: true, message: 'Please enter a description' }]}\n              \u003e\n                \u003cInput placeholder=\"Brief description of this agent's purpose\" /\u003e\n              \u003c/Form.Item\u003e\n              \n              \u003cForm.Item\n                name=\"system_prompt\"\n                label=\"System Instructions\"\n                rules={[{ required: true, message: 'Please enter system instructions' }]}\n              \u003e\n                \u003cTextArea\n                  placeholder=\"Instructions that define the agent's behavior\"\n                  autoSize={{ minRows: 4, maxRows: 8 }}\n                /\u003e\n              \u003c/Form.Item\u003e\n            \u003c/TabPane\u003e\n            \n            \u003cTabPane tab=\"Capabilities\" key=\"capabilities\"\u003e\n              \u003cForm.Item name=\"capabilities\" label=\"Agent Capabilities\"\u003e\n                \u003cCheckbox.Group\u003e\n                  \u003cdiv className=\"capabilities-grid\"\u003e\n                    \u003cCheckbox value=\"knowledge_retrieval\"\u003eKnowledge Retrieval\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"web_search\"\u003eWeb Search\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"memory\"\u003eLong-term Memory\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"calendar\"\u003eCalendar Awareness\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"code_execution\"\u003eCode Execution\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"data_visualization\"\u003eData Visualization\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"file_operations\"\u003eFile Operations\u003c/Checkbox\u003e\n                    \u003cCheckbox value=\"email\"\u003eEmail Integration\u003c/Checkbox\u003e\n                  \u003c/div\u003e\n                \u003c/Checkbox.Group\u003e\n              \u003c/Form.Item\u003e\n              \n              \u003cForm.Item name=\"preferred_models\" label=\"Preferred Models\"\u003e\n                \u003cSelect mode=\"multiple\" placeholder=\"Select preferred models\"\u003e\n                  \u003cOption value=\"openai:gpt-4\"\u003eOpenAI: GPT-4\u003c/Option\u003e\n                  \u003cOption value=\"openai:gpt-3.5-turbo\"\u003eOpenAI: GPT-3.5 Turbo\u003c/Option\u003e\n                  \u003cOption value=\"ollama:llama2\"\u003eOllama: Llama2\u003c/Option\u003e\n                  \u003cOption value=\"ollama:mistral\"\u003eOllama: Mistral\u003c/Option\u003e\n                  \u003cOption value=\"ollama:codellama\"\u003eOllama: CodeLlama\u003c/Option\u003e\n                \u003c/Select\u003e\n              \u003c/Form.Item\u003e\n            \u003c/TabPane\u003e\n            \n            \u003cTabPane tab=\"Advanced\" key=\"advanced\"\u003e\n              \u003cForm.Item name=\"tool_configuration\" label=\"Tool Configuration\"\u003e\n                \u003cTextArea\n                  placeholder=\"JSON configuration for tools (advanced)\"\n                  autoSize={{ minRows: 4, maxRows: 8 }}\n                /\u003e\n              \u003c/Form.Item\u003e\n              \n              \u003cForm.Item name=\"temperature\" label=\"Temperature\"\u003e\n                \u003cSelect placeholder=\"Response creativity level\"\u003e\n                  \u003cOption value=\"0.2\"\u003e0.2 - More deterministic/factual\u003c/Option\u003e\n                  \u003cOption value=\"0.5\"\u003e0.5 - Balanced\u003c/Option\u003e\n                  \u003cOption value=\"0.8\"\u003e0.8 - More creative/varied\u003c/Option\u003e\n                \u003c/Select\u003e\n              \u003c/Form.Item\u003e\n            \u003c/TabPane\u003e\n          \u003c/Tabs\u003e\n          \n          \u003cForm.Item\u003e\n            \u003cButton \n              type=\"primary\" \n              htmlType=\"submit\" \n              icon={\u003cSaveOutlined /\u003e}\n              loading={loading}\n            \u003e\n              {currentAgent ? 'Save Changes' : 'Create Agent'}\n            \u003c/Button\u003e\n          \u003c/Form.Item\u003e\n        \u003c/Form\u003e\n      \u003c/Card\u003e\n    \u003c/div\u003e\n  );\n};\n\nexport default AgentConfiguration;\n"])</script><script>self.__next_f.push([1,"12c:[\"$\",\"pre\",\"pre-83\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-jsx\",\"children\":\"$30d\"}]}]\n12d:[\"$\",\"h2\",\"h2-55\",{\"id\":\"user-interaction-flows\",\"children\":\"User Interaction Flows\"}]\n12e:[\"$\",\"h3\",\"h3-44\",{\"id\":\"new-user-onboarding-flow\",\"children\":\"New User Onboarding Flow\"}]\n"])</script><script>self.__next_f.push([1,"12f:[\"$\",\"pre\",\"pre-84\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"┌────────────────┐     ┌────────────────┐     ┌────────────────┐\\n│                │     │                │     │                │\\n│ Welcome Screen │────▶│ Initial Setup  │────▶│ API Key Setup  │\\n│                │     │                │     │                │\\n└────────────────┘     └────────────────┘     └───────┬────────┘\\n                                                      │\\n┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐\\n│                │     │                │     │                │\\n│  First Chat    │◀────│  Ollama Setup  │◀────│ Model Download │\\n│                │     │                │     │                │\\n└────────────────┘     └────────────────┘     └────────────────┘\\n\"}],\"position\":{\"start\":{\"line\":8952,\"column\":1,\"offset\":343036},\"end\":{\"line\":8964,\"column\":4,\"offset\":343749}}},\"children\":\"┌────────────────┐     ┌────────────────┐     ┌────────────────┐\\n│                │     │                │     │                │\\n│ Welcome Screen │────▶│ Initial Setup  │────▶│ API Key Setup  │\\n│                │     │                │     │                │\\n└────────────────┘     └────────────────┘     └───────┬────────┘\\n                                                      │\\n┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐\\n│                │     │                │     │                │\\n│  First Chat    │◀────│  Ollama Setup  │◀────│ Model Download │\\n│                │     │                │     │                │\\n└────────────────┘     └────────────────┘     └────────────────┘\\n\"}]}]\n"])</script><script>self.__next_f.push([1,"130:[\"$\",\"h3\",\"h3-45\",{\"id\":\"task-based-user-flow-example\",\"children\":\"Task-Based User Flow Example\"}]\n"])</script><script>self.__next_f.push([1,"131:[\"$\",\"pre\",\"pre-85\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"┌────────────────┐     ┌────────────────┐     ┌────────────────┐\\n│                │     │                │     │                │\\n│  Start Chat    │────▶│ Select Research│────▶│ Enter Research │\\n│                │     │     Agent      │     │    Query       │\\n└────────────────┘     └────────────────┘     └───────┬────────┘\\n                                                      │\\n┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐\\n│                │     │                │     │                │\\n│  Save Results  │◀────│  Refine Query  │◀────│ View Response  │\\n│                │     │                │     │ (Using OpenAI) │\\n└────────────────┘     └────────────────┘     └────────────────┘\\n\"}],\"position\":{\"start\":{\"line\":8968,\"column\":1,\"offset\":343785},\"end\":{\"line\":8980,\"column\":4,\"offset\":344498}}},\"children\":\"┌────────────────┐     ┌────────────────┐     ┌────────────────┐\\n│                │     │                │     │                │\\n│  Start Chat    │────▶│ Select Research│────▶│ Enter Research │\\n│                │     │     Agent      │     │    Query       │\\n└────────────────┘     └────────────────┘     └───────┬────────┘\\n                                                      │\\n┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐\\n│                │     │                │     │                │\\n│  Save Results  │◀────│  Refine Query  │◀────│ View Response  │\\n│                │     │                │     │ (Using OpenAI) │\\n└────────────────┘     └────────────────┘     └────────────────┘\\n\"}]}]\n"])</script><script>self.__next_f.push([1,"132:[\"$\",\"h3\",\"h3-46\",{\"id\":\"advanced-settings-flow\",\"children\":\"Advanced Settings Flow\"}]\n"])</script><script>self.__next_f.push([1,"133:[\"$\",\"pre\",\"pre-86\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"┌────────────────┐     ┌────────────────┐     ┌────────────────┐\\n│                │     │                │     │                │\\n│  Chat Screen   │────▶│ Settings Menu  │────▶│ Model Settings │\\n│                │     │                │     │                │\\n└────────────────┘     └────────────────┘     └───────┬────────┘\\n                                                      │\\n┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐\\n│                │     │                │     │                │\\n│  Return to     │◀────│ Save Settings  │◀────│ Agent Settings │\\n│    Chat        │     │                │     │                │\\n└────────────────┘     └────────────────┘     └────────────────┘\\n\"}],\"position\":{\"start\":{\"line\":8984,\"column\":1,\"offset\":344528},\"end\":{\"line\":8996,\"column\":4,\"offset\":345241}}},\"children\":\"┌────────────────┐     ┌────────────────┐     ┌────────────────┐\\n│                │     │                │     │                │\\n│  Chat Screen   │────▶│ Settings Menu  │────▶│ Model Settings │\\n│                │     │                │     │                │\\n└────────────────┘     └────────────────┘     └───────┬────────┘\\n                                                      │\\n┌────────────────┐     ┌────────────────┐     ┌───────▼────────┐\\n│                │     │                │     │                │\\n│  Return to     │◀────│ Save Settings  │◀────│ Agent Settings │\\n│    Chat        │     │                │     │                │\\n└────────────────┘     └────────────────┘     └────────────────┘\\n\"}]}]\n"])</script><script>self.__next_f.push([1,"134:[\"$\",\"h2\",\"h2-56\",{\"id\":\"implementation-recommendations\",\"children\":\"Implementation Recommendations\"}]\n135:[\"$\",\"ol\",\"ol-16\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Responsive Design:\"}],\" Ensure the web interface is mobile-friendly using responsive design principles\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Accessibility:\"}],\" Implement proper ARIA attributes and keyboard navigation for accessibility\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Progressive Enhancement:\"}],\" Build with a progressive enhancement approach where core functionality works without JavaScript\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"State Management:\"}],\" Use context API or Redux for global state in more complex implementations\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Offline Support:\"}],\" Consider adding service workers for offline functionality in the web interface\"]}],\"\\n\",[\"$\",\"li\",\"li-5\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"CLI Shortcuts:\"}],\" Implement tab completion and command history in the CLI for improved usability\"]}],\"\\n\"]}]\n136:[\"$\",\"h2\",\"h2-57\",{\"id\":\"conclusion-4\",\"children\":\"Conclusion\"}]\n137:[\"$\",\"p\",\"p-53\",{\"children\":\"The proposed user interface designs for the MCP system provide a balance between simplicity and power, enabling users to leverage the hybrid OpenAI-Ollama architecture effectively. The CLI offers a lightweight, scriptable interface for technical users and automation scenarios, while the web interface provides a rich, interactive experience for broader adoption.\"}]\n138:[\"$\",\"p\",\"p-54\",{\"children\":\"Both interfaces expose the key capabilities of the system:\"}]\n139:[\"$\",\"ol\",\"ol-17\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Intelligent Model Routing:\"}],\" Users can leverage automatic model selection or manually choose specific models\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Agent Specialization:\"}],\" Configurable agents enable task-specific optimization\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Privacy Controls:\"}],\" Explicit options for privacy-sensitive content\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Performance Analytics:\"}],\" Visibility into system usage, costs, and efficiency\"]}],\"\\n\"]}]\n13a:[\"$\",\"p\",\"p-55\",{\"children\":\"These interfaces serve as the critical touchpoint between users and the sophisticated underlying architecture, making complex AI capabilities accessible and manageable.\"}]\n13b:[\"$\",\"h1\",\"h1-6\",{\"id\":\"optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system\",\"children\":\"Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System\"}]\n13c:[\"$\",\"h2\",\"h2-58\",{\"id\":\"strategic-optimization-framework\",\"children\":\"Strategic Optimization Framework\"}]\n13d:[\"$\",\"p\",\"p-56\",{\"children\":\"The integration of cloud-based and local inference capabilities within a unified architecture presents unique opportunities for optimization across multiple dimensions. This document outlines comprehensive strategies for enhancing performance, reducing operational costs, and improving response accuracy, followed by detailed deployment methodologies for both local and cloud environments.\"}]\n13e:[\"$\",\"h2\",\"h2-59\",{\"id\":\"performance-optimization-strategies\",\"children\":\"Performance Optimization Strategies\"}]\n13f:[\"$\",\"h3\",\"h3-47\",{\"id\":\"1-query-routing-optimization\",\"children\":\"1. Query Routing Optimization\"}]\n30e:T1f74,"])</script><script>self.__next_f.push([1,"# app/services/routing_optimizer.py\nimport logging\nimport numpy as np\nfrom typing import Dict, List, Any, Optional\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nclass RoutingOptimizer:\n    \"\"\"Optimizes routing decisions based on historical performance data.\"\"\"\n    \n    def __init__(self, cache_size: int = 1000):\n        self.performance_history = {}\n        self.cache_size = cache_size\n        self.learning_rate = 0.05\n        \n        # Baseline thresholds\n        self.complexity_threshold = settings.COMPLEXITY_THRESHOLD\n        self.token_threshold = 800  # Approximate tokens before preferring cloud\n        self.latency_requirement = 2.0  # Seconds\n        \n        # Performance weights\n        self.weights = {\n            \"complexity\": 0.4,\n            \"token_count\": 0.2,\n            \"privacy_score\": 0.3,\n            \"tool_requirement\": 0.1\n        }\n    \n    def update_performance_metrics(self, \n                                  provider: str, \n                                  model: str,\n                                  query_complexity: float, \n                                  token_count: int,\n                                  response_time: float,\n                                  success: bool) -\u003e None:\n        \"\"\"Update performance metrics based on actual results.\"\"\"\n        model_key = f\"{provider}:{model}\"\n        \n        if model_key not in self.performance_history:\n            self.performance_history[model_key] = {\n                \"queries\": 0,\n                \"avg_response_time\": 0,\n                \"success_rate\": 0,\n                \"complexity_performance\": {}  # Maps complexity ranges to success/time\n            }\n        \n        metrics = self.performance_history[model_key]\n        \n        # Update metrics with exponential moving average\n        metrics[\"queries\"] += 1\n        metrics[\"avg_response_time\"] = (\n            (1 - self.learning_rate) * metrics[\"avg_response_time\"] + \n            self.learning_rate * response_time\n        )\n        \n        # Update success rate\n        old_success_rate = metrics[\"success_rate\"]\n        queries = metrics[\"queries\"]\n        metrics[\"success_rate\"] = (old_success_rate * (queries - 1) + (1 if success else 0)) / queries\n        \n        # Update complexity-specific performance\n        complexity_bin = round(query_complexity * 10) / 10  # Round to nearest 0.1\n        \n        if complexity_bin not in metrics[\"complexity_performance\"]:\n            metrics[\"complexity_performance\"][complexity_bin] = {\n                \"count\": 0,\n                \"avg_time\": 0,\n                \"success_rate\": 0\n            }\n            \n        bin_metrics = metrics[\"complexity_performance\"][complexity_bin]\n        bin_metrics[\"count\"] += 1\n        bin_metrics[\"avg_time\"] = (\n            (bin_metrics[\"count\"] - 1) * bin_metrics[\"avg_time\"] + response_time\n        ) / bin_metrics[\"count\"]\n        \n        bin_metrics[\"success_rate\"] = (\n            (bin_metrics[\"count\"] - 1) * bin_metrics[\"success_rate\"] + (1 if success else 0)\n        ) / bin_metrics[\"count\"]\n        \n        # Prune cache if needed\n        if len(self.performance_history) \u003e self.cache_size:\n            # Remove least used models\n            sorted_models = sorted(\n                self.performance_history.items(),\n                key=lambda x: x[1][\"queries\"]\n            )\n            for i in range(len(self.performance_history) - self.cache_size):\n                if i \u003c len(sorted_models):\n                    del self.performance_history[sorted_models[i][0]]\n    \n    def optimize_thresholds(self) -\u003e None:\n        \"\"\"Periodically optimize routing thresholds based on collected metrics.\"\"\"\n        if not self.performance_history:\n            return\n        \n        openai_models = [k for k in self.performance_history if k.startswith(\"openai:\")]\n        ollama_models = [k for k in self.performance_history if k.startswith(\"ollama:\")]\n        \n        if not openai_models or not ollama_models:\n            return  # Need data from both providers\n        \n        # Calculate average performance metrics for each provider\n        openai_avg_time = np.mean([\n            self.performance_history[model][\"avg_response_time\"] \n            for model in openai_models\n        ])\n        ollama_avg_time = np.mean([\n            self.performance_history[model][\"avg_response_time\"] \n            for model in ollama_models\n        ])\n        \n        # Find optimal complexity threshold by analyzing where Ollama begins to struggle\n        complexity_success_rates = {}\n        \n        for model in ollama_models:\n            for complexity, metrics in self.performance_history[model][\"complexity_performance\"].items():\n                if complexity not in complexity_success_rates:\n                    complexity_success_rates[complexity] = []\n                complexity_success_rates[complexity].append(metrics[\"success_rate\"])\n        \n        # Find the complexity level where Ollama success rate drops significantly\n        optimal_threshold = self.complexity_threshold  # Start with current\n        \n        if complexity_success_rates:\n            complexities = sorted(complexity_success_rates.keys())\n            avg_success_rates = [\n                np.mean(complexity_success_rates[c]) for c in complexities\n            ]\n            \n            # Find first major drop in success rate\n            for i in range(1, len(complexities)):\n                if (avg_success_rates[i-1] - avg_success_rates[i]) \u003e 0.15:  # 15% drop\n                    optimal_threshold = complexities[i-1]\n                    break\n            \n            # If no clear drop, look for when it falls below 85%\n            if optimal_threshold == self.complexity_threshold:\n                for i, c in enumerate(complexities):\n                    if avg_success_rates[i] \u003c 0.85:\n                        optimal_threshold = c\n                        break\n        \n        # Update thresholds (with dampening to avoid oscillation)\n        self.complexity_threshold = (\n            0.8 * self.complexity_threshold + \n            0.2 * optimal_threshold\n        )\n        \n        # Update latency requirements based on current performance\n        self.latency_requirement = max(1.0, min(ollama_avg_time * 1.2, 5.0))\n        \n        logger.info(f\"Optimized routing thresholds: complexity={self.complexity_threshold:.2f}, latency={self.latency_requirement:.2f}s\")\n    \n    def get_optimal_provider(self, \n                           query_complexity: float,\n                           privacy_score: float,\n                           estimated_tokens: int,\n                           requires_tools: bool) -\u003e str:\n        \"\"\"Get the optimal provider based on current metrics and query characteristics.\"\"\"\n        # Calculate weighted score for routing decision\n        openai_score = 0\n        ollama_score = 0\n        \n        # Complexity factor\n        if query_complexity \u003e self.complexity_threshold:\n            openai_score += self.weights[\"complexity\"]\n        else:\n            ollama_score += self.weights[\"complexity\"]\n        \n        # Token count factor\n        if estimated_tokens \u003e self.token_threshold:\n            openai_score += self.weights[\"token_count\"]\n        else:\n            ollama_score += self.weights[\"token_count\"]\n        \n        # Privacy factor (higher privacy score means more sensitive)\n        if privacy_score \u003e 0.5:\n            ollama_score += self.weights[\"privacy_score\"]\n        else:\n            # Split proportionally\n            ollama_privacy = self.weights[\"privacy_score\"] * privacy_score * 2\n            openai_privacy = self.weights[\"privacy_score\"] * (1 - privacy_score * 2)\n            ollama_score += ollama_privacy\n            openai_score += openai_privacy\n            \n        # Tool requirements factor\n        if requires_tools:\n            openai_score += self.weights[\"tool_requirement\"]\n        \n        # Return the provider with higher score\n        return \"openai\" if openai_score \u003e ollama_score else \"ollama\"\n"])</script><script>self.__next_f.push([1,"140:[\"$\",\"pre\",\"pre-87\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$30e\"}]}]\n141:[\"$\",\"h3\",\"h3-48\",{\"id\":\"2-response-caching-with-semantic-search\",\"children\":\"2. Response Caching with Semantic Search\"}]\n30f:T1f80,"])</script><script>self.__next_f.push([1,"# app/services/cache_service.py\nimport time\nimport hashlib\nimport json\nfrom typing import Dict, List, Any, Optional, Tuple\nimport numpy as np\nfrom scipy.spatial.distance import cosine\nimport aioredis\n\nfrom app.config import settings\nfrom app.services.embedding_service import EmbeddingService\n\nclass SemanticCache:\n    \"\"\"Intelligent caching system using semantic similarity.\"\"\"\n    \n    def __init__(self, embedding_service: EmbeddingService, ttl: int = 3600):\n        self.embedding_service = embedding_service\n        self.redis = None\n        self.ttl = ttl\n        self.similarity_threshold = 0.92  # Threshold for semantic similarity\n        self.exact_cache_enabled = True\n        self.semantic_cache_enabled = True\n    \n    async def initialize(self):\n        \"\"\"Initialize Redis connection.\"\"\"\n        self.redis = await aioredis.create_redis_pool(settings.REDIS_URL)\n    \n    async def close(self):\n        \"\"\"Close Redis connection.\"\"\"\n        if self.redis:\n            self.redis.close()\n            await self.redis.wait_closed()\n    \n    def _get_exact_cache_key(self, messages: List[Dict], provider: str, model: str) -\u003e str:\n        \"\"\"Generate an exact cache key from request parameters.\"\"\"\n        # Normalize the request to ensure consistent keys\n        normalized = {\n            \"messages\": messages,\n            \"provider\": provider,\n            \"model\": model\n        }\n        serialized = json.dumps(normalized, sort_keys=True)\n        return f\"exact:{hashlib.md5(serialized.encode()).hexdigest()}\"\n    \n    async def _get_embedding_key(self, text: str) -\u003e str:\n        \"\"\"Get the embedding key for a text string.\"\"\"\n        return f\"emb:{hashlib.md5(text.encode()).hexdigest()}\"\n    \n    async def _store_embedding(self, text: str, embedding: List[float]) -\u003e None:\n        \"\"\"Store an embedding in Redis.\"\"\"\n        key = await self._get_embedding_key(text)\n        await self.redis.set(key, json.dumps(embedding), expire=self.ttl)\n    \n    async def _get_embedding(self, text: str) -\u003e Optional[List[float]]:\n        \"\"\"Get an embedding from Redis or compute it if not found.\"\"\"\n        key = await self._get_embedding_key(text)\n        cached = await self.redis.get(key)\n        \n        if cached:\n            return json.loads(cached)\n        \n        # Generate new embedding\n        embedding = await self.embedding_service.get_embedding(text)\n        if embedding:\n            await self._store_embedding(text, embedding)\n        \n        return embedding\n    \n    async def _compute_similarity(self, embedding1: List[float], embedding2: List[float]) -\u003e float:\n        \"\"\"Compute cosine similarity between embeddings.\"\"\"\n        return 1 - cosine(embedding1, embedding2)\n    \n    async def get(self, messages: List[Dict], provider: str, model: str) -\u003e Optional[Dict]:\n        \"\"\"Get a cached response if available.\"\"\"\n        if not self.redis:\n            return None\n            \n        # Try exact match first\n        if self.exact_cache_enabled:\n            exact_key = self._get_exact_cache_key(messages, provider, model)\n            cached = await self.redis.get(exact_key)\n            if cached:\n                return json.loads(cached)\n        \n        # Try semantic search if enabled\n        if self.semantic_cache_enabled:\n            # Extract query text (last user message)\n            query_text = None\n            for msg in reversed(messages):\n                if msg.get(\"role\") == \"user\" and msg.get(\"content\"):\n                    query_text = msg[\"content\"]\n                    break\n            \n            if not query_text:\n                return None\n            \n            # Get embedding for query\n            query_embedding = await self._get_embedding(query_text)\n            if not query_embedding:\n                return None\n            \n            # Get all semantic cache keys\n            semantic_keys = await self.redis.keys(\"semantic:*\")\n            if not semantic_keys:\n                return None\n            \n            # Find most similar cached query\n            best_match = None\n            best_similarity = 0\n            \n            for key in semantic_keys:\n                # Get metadata\n                meta_key = f\"{key}:meta\"\n                meta_data = await self.redis.get(meta_key)\n                if not meta_data:\n                    continue\n                \n                meta = json.loads(meta_data)\n                cached_embedding = meta.get(\"embedding\")\n                \n                if not cached_embedding:\n                    continue\n                \n                # Check provider/model compatibility\n                if (provider != \"auto\" and meta.get(\"provider\") != provider) or \\\n                   (model and meta.get(\"model\") != model):\n                    continue\n                \n                # Compute similarity\n                similarity = await self._compute_similarity(query_embedding, cached_embedding)\n                \n                if similarity \u003e self.similarity_threshold and similarity \u003e best_similarity:\n                    best_match = key\n                    best_similarity = similarity\n            \n            if best_match:\n                cached = await self.redis.get(best_match)\n                if cached:\n                    # Record cache hit analytics\n                    await self.redis.incr(\"stats:semantic_cache_hits\")\n                    return json.loads(cached)\n        \n        # Record cache miss\n        await self.redis.incr(\"stats:cache_misses\")\n        return None\n    \n    async def set(self, messages: List[Dict], provider: str, model: str, response: Dict) -\u003e None:\n        \"\"\"Set a response in the cache.\"\"\"\n        if not self.redis:\n            return\n            \n        # Set exact match cache\n        if self.exact_cache_enabled:\n            exact_key = self._get_exact_cache_key(messages, provider, model)\n            await self.redis.set(exact_key, json.dumps(response), expire=self.ttl)\n        \n        # Set semantic cache\n        if self.semantic_cache_enabled:\n            # Extract query text (last user message)\n            query_text = None\n            for msg in reversed(messages):\n                if msg.get(\"role\") == \"user\" and msg.get(\"content\"):\n                    query_text = msg[\"content\"]\n                    break\n            \n            if not query_text:\n                return\n            \n            # Get embedding for query\n            query_embedding = await self._get_embedding(query_text)\n            if not query_embedding:\n                return\n            \n            # Generate semantic key\n            semantic_key = f\"semantic:{time.time()}:{hashlib.md5(query_text.encode()).hexdigest()}\"\n            \n            # Store response\n            await self.redis.set(semantic_key, json.dumps(response), expire=self.ttl)\n            \n            # Store metadata (for similarity search)\n            meta_data = {\n                \"query\": query_text,\n                \"embedding\": query_embedding,\n                \"provider\": response.get(\"provider\", provider),\n                \"model\": response.get(\"model\", model),\n                \"timestamp\": time.time()\n            }\n            \n            await self.redis.set(f\"{semantic_key}:meta\", json.dumps(meta_data), expire=self.ttl)\n    \n    async def get_stats(self) -\u003e Dict[str, int]:\n        \"\"\"Get cache statistics.\"\"\"\n        if not self.redis:\n            return {\"hits\": 0, \"misses\": 0, \"semantic_hits\": 0}\n            \n        exact_hits = int(await self.redis.get(\"stats:exact_cache_hits\") or 0)\n        semantic_hits = int(await self.redis.get(\"stats:semantic_cache_hits\") or 0)\n        misses = int(await self.redis.get(\"stats:cache_misses\") or 0)\n        \n        return {\n            \"exact_hits\": exact_hits,\n            \"semantic_hits\": semantic_hits,\n            \"total_hits\": exact_hits + semantic_hits,\n            \"misses\": misses,\n            \"hit_rate\": (exact_hits + semantic_hits) / (exact_hits + semantic_hits + misses) if (exact_hits + semantic_hits + misses) \u003e 0 else 0\n        }\n"])</script><script>self.__next_f.push([1,"142:[\"$\",\"pre\",\"pre-88\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$30f\"}]}]\n143:[\"$\",\"h3\",\"h3-49\",{\"id\":\"3-parallel-query-processing\",\"children\":\"3. Parallel Query Processing\"}]\n310:T26c6,"])</script><script>self.__next_f.push([1,"# app/services/parallel_processor.py\nimport asyncio\nfrom typing import List, Dict, Any, Optional, Tuple\nimport logging\nimport time\n\nfrom app.services.provider_service import ProviderService\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nclass ParallelProcessor:\n    \"\"\"Processes complex queries by decomposing and running in parallel.\"\"\"\n    \n    def __init__(self, provider_service: ProviderService):\n        self.provider_service = provider_service\n        # Threshold for when to use parallel processing\n        self.complexity_threshold = 0.8\n        self.parallel_enabled = settings.ENABLE_PARALLEL_PROCESSING\n    \n    async def should_process_in_parallel(self, messages: List[Dict]) -\u003e bool:\n        \"\"\"Determine if a query should be processed in parallel.\"\"\"\n        if not self.parallel_enabled:\n            return False\n            \n        # Get the last user message\n        user_message = None\n        for msg in reversed(messages):\n            if msg.get(\"role\") == \"user\":\n                user_message = msg.get(\"content\", \"\")\n                break\n        \n        if not user_message:\n            return False\n            \n        # Check message length\n        if len(user_message.split()) \u003c 50:\n            return False\n            \n        # Check for complexity indicators\n        complexity_markers = [\n            \"compare\", \"analyze\", \"different perspectives\", \"pros and cons\",\n            \"multiple aspects\", \"detail\", \"comprehensive\", \"multifaceted\"\n        ]\n        \n        marker_count = sum(1 for marker in complexity_markers if marker in user_message.lower())\n        \n        # Check for multiple questions\n        question_count = user_message.count(\"?\")\n        \n        # Calculate complexity score\n        complexity = (marker_count * 0.15) + (question_count * 0.2) + (len(user_message.split()) / 500)\n        \n        return complexity \u003e self.complexity_threshold\n    \n    async def decompose_query(self, query: str) -\u003e List[str]:\n        \"\"\"Decompose a complex query into simpler sub-queries.\"\"\"\n        # Use the provider service to generate the decomposition\n        decompose_messages = [\n            {\"role\": \"system\", \"content\": \"\"\"\n            You are a query decomposition specialist. Your job is to break down complex questions into \n            simpler, independent sub-questions that can be answered separately and then combined.\n            \n            Return a JSON array of strings, where each string is a sub-question.\n            For example: [\"What are the basics of quantum computing?\", \"How does quantum computing differ from classical computing?\"]\n            \n            Keep the total number of sub-questions between 2 and 5.\n            \"\"\"},\n            {\"role\": \"user\", \"content\": f\"Decompose this complex query into simpler sub-questions: {query}\"}\n        ]\n        \n        try:\n            response = await self.provider_service.generate_completion(\n                messages=decompose_messages,\n                provider=\"openai\",  # Use OpenAI for decomposition\n                model=\"gpt-3.5-turbo\", # Use a faster model for this task\n                response_format={\"type\": \"json_object\"}\n            )\n            \n            if response and response.get(\"message\", {}).get(\"content\"):\n                import json\n                result = json.loads(response[\"message\"][\"content\"])\n                if isinstance(result, list) and all(isinstance(item, str) for item in result):\n                    return result\n                elif isinstance(result, dict) and \"sub_questions\" in result:\n                    return result[\"sub_questions\"]\n            \n            # Fallback to simple decomposition\n            return [query]\n            \n        except Exception as e:\n            logger.error(f\"Error decomposing query: {str(e)}\")\n            # Fallback to simple decomposition\n            return [query]\n    \n    async def process_sub_query(self, sub_query: str, provider: str, model: str) -\u003e Dict[str, Any]:\n        \"\"\"Process a single sub-query.\"\"\"\n        messages = [{\"role\": \"user\", \"content\": sub_query}]\n        \n        start_time = time.time()\n        response = await self.provider_service.generate_completion(\n            messages=messages,\n            provider=provider,\n            model=model\n        )\n        duration = time.time() - start_time\n        \n        return {\n            \"query\": sub_query,\n            \"response\": response,\n            \"content\": response.get(\"message\", {}).get(\"content\", \"\"),\n            \"duration\": duration\n        }\n    \n    async def synthesize_responses(self, \n                                 original_query: str, \n                                 sub_results: List[Dict]) -\u003e str:\n        \"\"\"Synthesize the responses from sub-queries into a cohesive answer.\"\"\"\n        # Extract the responses\n        synthesize_prompt = f\"\"\"\n        Original question: {original_query}\n        \n        I've broken this question down into parts and found the following information:\n        \n        {\n            ''.join([f\"Sub-question: {r['query']}\\nAnswer: {r['content']}\\n\\n\" for r in sub_results])\n        }\n        \n        Please synthesize this information into a cohesive, comprehensive answer to the original question.\n        Ensure the response is well-structured and flows naturally as if it were answering the original\n        question directly. Maintain a consistent tone throughout.\n        \"\"\"\n        \n        messages = [\n            {\"role\": \"system\", \"content\": \"You are an expert at synthesizing information from multiple sources into cohesive, comprehensive answers.\"},\n            {\"role\": \"user\", \"content\": synthesize_prompt}\n        ]\n        \n        try:\n            response = await self.provider_service.generate_completion(\n                messages=messages,\n                provider=\"openai\",  # Use OpenAI for synthesis\n                model=\"gpt-4\"  # Use a more capable model for synthesis\n            )\n            \n            if response and response.get(\"message\", {}).get(\"content\"):\n                return response[\"message\"][\"content\"]\n            \n            # Fallback\n            return \"\\n\\n\".join([r['content'] for r in sub_results])\n        \n        except Exception as e:\n            logger.error(f\"Error synthesizing responses: {str(e)}\")\n            # Fallback to simple concatenation\n            return \"\\n\\n\".join([f\"Regarding '{r['query']}':\\n{r['content']}\" for r in sub_results])\n    \n    async def process_in_parallel(self, \n                                messages: List[Dict], \n                                provider: str = \"auto\", \n                                model: str = None) -\u003e Dict[str, Any]:\n        \"\"\"Process a complex query by breaking it down and processing in parallel.\"\"\"\n        # Get the last user message\n        user_message = None\n        for msg in reversed(messages):\n            if msg.get(\"role\") == \"user\":\n                user_message = msg.get(\"content\", \"\")\n                break\n        \n        if not user_message:\n            # Fallback to regular processing\n            return await self.provider_service.generate_completion(\n                messages=messages,\n                provider=provider,\n                model=model\n            )\n        \n        # Decompose the query\n        sub_queries = await self.decompose_query(user_message)\n        \n        if len(sub_queries) \u003c= 1:\n            # Not complex enough to benefit from parallel processing\n            return await self.provider_service.generate_completion(\n                messages=messages,\n                provider=provider,\n                model=model\n            )\n        \n        # Process sub-queries in parallel\n        tasks = [\n            self.process_sub_query(query, provider, model)\n            for query in sub_queries\n        ]\n        \n        sub_results = await asyncio.gather(*tasks)\n        \n        # Synthesize the results\n        final_content = await self.synthesize_responses(user_message, sub_results)\n        \n        # Calculate aggregated metrics\n        total_duration = sum(result[\"duration\"] for result in sub_results)\n        providers_used = [result[\"response\"].get(\"provider\") for result in sub_results \n                         if result[\"response\"].get(\"provider\")]\n        models_used = [result[\"response\"].get(\"model\") for result in sub_results \n                      if result[\"response\"].get(\"model\")]\n        \n        # Construct a response in the same format as provider_service.generate_completion\n        return {\n            \"id\": f\"parallel_{int(time.time())}\",\n            \"object\": \"chat.completion\",\n            \"created\": int(time.time()),\n            \"model\": \", \".join(set(models_used)) if models_used else model,\n            \"provider\": \", \".join(set(providers_used)) if providers_used else provider,\n            \"usage\": {\n                \"prompt_tokens\": sum(result[\"response\"].get(\"usage\", {}).get(\"prompt_tokens\", 0) \n                                  for result in sub_results),\n                \"completion_tokens\": sum(result[\"response\"].get(\"usage\", {}).get(\"completion_tokens\", 0) \n                                      for result in sub_results),\n                \"total_tokens\": sum(result[\"response\"].get(\"usage\", {}).get(\"total_tokens\", 0) \n                                 for result in sub_results)\n            },\n            \"message\": {\n                \"role\": \"assistant\",\n                \"content\": final_content\n            },\n            \"parallel_processing\": {\n                \"sub_queries\": len(sub_queries),\n                \"total_duration\": total_duration,\n                \"max_duration\": max(result[\"duration\"] for result in sub_results),\n                \"processing_efficiency\": 1 - (max(result[\"duration\"] for result in sub_results) / total_duration) \n                                        if total_duration \u003e 0 else 0\n            }\n        }\n"])</script><script>self.__next_f.push([1,"144:[\"$\",\"pre\",\"pre-89\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$310\"}]}]\n145:[\"$\",\"h3\",\"h3-50\",{\"id\":\"4-dynamic-batching-for-high-load-scenarios\",\"children\":\"4. Dynamic Batching for High-Load Scenarios\"}]\n311:T1492,"])</script><script>self.__next_f.push([1,"# app/services/batch_processor.py\nimport asyncio\nfrom typing import List, Dict, Any, Optional, Callable, Awaitable\nimport time\nimport logging\nfrom collections import deque\n\nlogger = logging.getLogger(__name__)\n\nclass RequestBatcher:\n    \"\"\"\n    Dynamically batches requests to optimize throughput under high load.\n    \"\"\"\n    \n    def __init__(self, \n                max_batch_size: int = 4,\n                max_wait_time: float = 0.1,\n                processor_fn: Optional[Callable] = None):\n        self.max_batch_size = max_batch_size\n        self.max_wait_time = max_wait_time\n        self.processor_fn = processor_fn\n        self.queue = deque()\n        self.batch_task = None\n        self.active = False\n        self.stats = {\n            \"total_requests\": 0,\n            \"total_batches\": 0,\n            \"avg_batch_size\": 0,\n            \"max_queue_length\": 0\n        }\n    \n    async def start(self):\n        \"\"\"Start the batch processor.\"\"\"\n        if self.active:\n            return\n            \n        self.active = True\n        self.batch_task = asyncio.create_task(self._batch_processor())\n        logger.info(\"Batch processor started\")\n    \n    async def stop(self):\n        \"\"\"Stop the batch processor.\"\"\"\n        if not self.active:\n            return\n            \n        self.active = False\n        if self.batch_task:\n            try:\n                self.batch_task.cancel()\n                await self.batch_task\n            except asyncio.CancelledError:\n                pass\n        \n        logger.info(\"Batch processor stopped\")\n    \n    async def _batch_processor(self):\n        \"\"\"Background task to process batches.\"\"\"\n        while self.active:\n            try:\n                # Process any batches in the queue\n                await self._process_next_batch()\n                \n                # Wait a small amount of time before checking again\n                await asyncio.sleep(0.01)\n            except Exception as e:\n                logger.error(f\"Error in batch processor: {str(e)}\")\n                await asyncio.sleep(1)  # Wait longer on error\n    \n    async def _process_next_batch(self):\n        \"\"\"Process the next batch from the queue.\"\"\"\n        if not self.queue:\n            return\n            \n        # Start timing from oldest request\n        oldest_request_time = self.queue[0][2]\n        current_time = time.time()\n        \n        # Process if we have max batch size or max wait time elapsed\n        if len(self.queue) \u003e= self.max_batch_size or \\\n           (current_time - oldest_request_time) \u003e= self.max_wait_time:\n            \n            # Extract batch (up to max_batch_size)\n            batch_size = min(len(self.queue), self.max_batch_size)\n            batch = []\n            \n            for _ in range(batch_size):\n                request, future, _ = self.queue.popleft()\n                batch.append((request, future))\n            \n            # Update stats\n            self.stats[\"total_batches\"] += 1\n            self.stats[\"avg_batch_size\"] = ((self.stats[\"avg_batch_size\"] * (self.stats[\"total_batches\"] - 1)) + batch_size) / self.stats[\"total_batches\"]\n            \n            # Process batch\n            asyncio.create_task(self._process_batch(batch))\n    \n    async def _process_batch(self, batch: List[tuple]):\n        \"\"\"Process a batch of requests.\"\"\"\n        if not self.processor_fn:\n            for _, future in batch:\n                if not future.done():\n                    future.set_exception(ValueError(\"No processor function set\"))\n            return\n        \n        # Extract just the requests for processing\n        requests = [req for req, _ in batch]\n        \n        try:\n            # Process the batch\n            results = await self.processor_fn(requests)\n            \n            # Match results to futures\n            if results and len(results) == len(batch):\n                for i, (_, future) in enumerate(batch):\n                    if not future.done():\n                        future.set_result(results[i])\n            else:\n                # Handle mismatch in results\n                logger.error(f\"Batch result count mismatch: {len(results)} results for {len(batch)} requests\")\n                for _, future in batch:\n                    if not future.done():\n                        future.set_exception(ValueError(\"Batch processing error: result count mismatch\"))\n                        \n        except Exception as e:\n            logger.error(f\"Error processing batch: {str(e)}\")\n            # Set exception for all futures in batch\n            for _, future in batch:\n                if not future.done():\n                    future.set_exception(e)\n    \n    async def submit(self, request: Any) -\u003e Any:\n        \"\"\"Submit a request for batched processing.\"\"\"\n        self.stats[\"total_requests\"] += 1\n        \n        # Create future for this request\n        future = asyncio.Future()\n        \n        # Add to queue with timestamp\n        self.queue.append((request, future, time.time()))\n        \n        # Update max queue length stat\n        queue_length = len(self.queue)\n        if queue_length \u003e self.stats[\"max_queue_length\"]:\n            self.stats[\"max_queue_length\"] = queue_length\n        \n        # Return future\n        return await future\n"])</script><script>self.__next_f.push([1,"146:[\"$\",\"pre\",\"pre-90\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$311\"}]}]\n147:[\"$\",\"h3\",\"h3-51\",{\"id\":\"5-model-specific-prompt-optimization\",\"children\":\"5. Model-Specific Prompt Optimization\"}]\n312:T2522,"])</script><script>self.__next_f.push([1,"# app/services/prompt_optimizer.py\nimport logging\nfrom typing import List, Dict, Any, Optional\nimport re\n\nlogger = logging.getLogger(__name__)\n\nclass PromptOptimizer:\n    \"\"\"Optimizes prompts for specific models to improve response quality and reduce token usage.\"\"\"\n    \n    def __init__(self):\n        self.model_specific_templates = {\n            # OpenAI models\n            \"gpt-4\": {\n                \"prefix\": \"\",  # GPT-4 doesn't need special prefixing\n                \"suffix\": \"\",\n                \"instruction_format\": \"{instruction}\"\n            },\n            \"gpt-3.5-turbo\": {\n                \"prefix\": \"\",\n                \"suffix\": \"\",\n                \"instruction_format\": \"{instruction}\"\n            },\n            \n            # Ollama models - they benefit from more explicit formatting\n            \"llama2\": {\n                \"prefix\": \"\",\n                \"suffix\": \"Think step-by-step and be thorough in your response.\",\n                \"instruction_format\": \"{instruction}\"\n            },\n            \"llama2:70b\": {\n                \"prefix\": \"\",\n                \"suffix\": \"\",\n                \"instruction_format\": \"{instruction}\"\n            },\n            \"mistral\": {\n                \"prefix\": \"\",\n                \"suffix\": \"Take a deep breath and work on this step-by-step.\",\n                \"instruction_format\": \"{instruction}\"\n            },\n            \"codellama\": {\n                \"prefix\": \"You are an expert programmer with years of experience. \",\n                \"suffix\": \"Make sure your code is correct and efficient.\",\n                \"instruction_format\": \"Task: {instruction}\"\n            },\n            \"wizard-math\": {\n                \"prefix\": \"You are a mathematics expert. \",\n                \"suffix\": \"Show your work step-by-step and explain your reasoning clearly.\",\n                \"instruction_format\": \"Problem: {instruction}\"\n            }\n        }\n        \n        # Default template to use when model not specifically defined\n        self.default_template = {\n            \"prefix\": \"\",\n            \"suffix\": \"\",\n            \"instruction_format\": \"{instruction}\"\n        }\n        \n        # Task-specific optimizations\n        self.task_templates = {\n            \"code_generation\": {\n                \"prefix\": \"You are an expert programmer. \",\n                \"suffix\": \"Ensure your code is correct, efficient, and well-commented.\",\n                \"instruction_format\": \"Programming Task: {instruction}\"\n            },\n            \"creative_writing\": {\n                \"prefix\": \"You are a creative writer with excellent storytelling abilities. \",\n                \"suffix\": \"\",\n                \"instruction_format\": \"Creative Writing Prompt: {instruction}\"\n            },\n            \"reasoning\": {\n                \"prefix\": \"You are a logical thinker with strong reasoning skills. \",\n                \"suffix\": \"Think step-by-step and be precise in your analysis.\",\n                \"instruction_format\": \"Reasoning Task: {instruction}\"\n            },\n            \"math\": {\n                \"prefix\": \"You are a mathematics expert. \",\n                \"suffix\": \"Show your work step-by-step with explanations.\",\n                \"instruction_format\": \"Math Problem: {instruction}\"\n            }\n        }\n    \n    def detect_task_type(self, message: str) -\u003e Optional[str]:\n        \"\"\"Detect the type of task from the message content.\"\"\"\n        message_lower = message.lower()\n        \n        # Code detection patterns\n        code_patterns = [\n            r\"write (a|an|the)?\\s?(code|function|program|script|class|method)\",\n            r\"implement (a|an|the)?\\s?(algorithm|function|class|method)\",\n            r\"debug (this|the)?\\s?(code|function|program)\",\n            r\"(js|javascript|python|java|c\\+\\+|go|rust|typescript)\"\n        ]\n        \n        # Creative writing patterns\n        creative_patterns = [\n            r\"write (a|an|the)?\\s?(story|poem|essay|narrative|scene)\",\n            r\"create (a|an|the)?\\s?(story|character|dialogue|setting)\",\n            r\"describe (a|an|the)?\\s?(scene|character|setting|world)\"\n        ]\n        \n        # Math patterns\n        math_patterns = [\n            r\"calculate\",\n            r\"solve (this|the)?\\s?(equation|problem|expression)\",\n            r\"compute\",\n            r\"what is (the)?\\s?(value|result|answer)\",\n            r\"find (the)?\\s?(derivative|integral|product|sum|limit)\"\n        ]\n        \n        # Reasoning patterns\n        reasoning_patterns = [\n            r\"analyze\",\n            r\"compare (and|\u0026) contrast\",\n            r\"explain (why|how)\",\n            r\"what are (the)?\\s?(pros|cons|advantages|disadvantages)\",\n            r\"evaluate\"\n        ]\n        \n        # Check each pattern set\n        for pattern in code_patterns:\n            if re.search(pattern, message_lower):\n                return \"code_generation\"\n                \n        for pattern in creative_patterns:\n            if re.search(pattern, message_lower):\n                return \"creative_writing\"\n                \n        for pattern in math_patterns:\n            if re.search(pattern, message_lower):\n                return \"math\"\n                \n        for pattern in reasoning_patterns:\n            if re.search(pattern, message_lower):\n                return \"reasoning\"\n        \n        return None\n    \n    def optimize_system_prompt(self, original_prompt: str, model: str, task_type: Optional[str] = None) -\u003e str:\n        \"\"\"Optimize the system prompt for the specific model and task.\"\"\"\n        # If no original prompt, return an appropriate default\n        if not original_prompt:\n            return \"You are a helpful assistant. Provide accurate, detailed, and clear responses.\"\n        \n        # Get model-specific template\n        template = self.model_specific_templates.get(model, self.default_template)\n        \n        # If task type is provided, incorporate task-specific optimizations\n        if task_type and task_type in self.task_templates:\n            task_template = self.task_templates[task_type]\n            \n            # Merge templates, with task template taking precedence for non-empty values\n            merged_template = {\n                \"prefix\": task_template[\"prefix\"] if task_template[\"prefix\"] else template[\"prefix\"],\n                \"suffix\": task_template[\"suffix\"] if task_template[\"suffix\"] else template[\"suffix\"],\n                \"instruction_format\": task_template[\"instruction_format\"]\n            }\n            \n            template = merged_template\n        \n        # Apply template\n        optimized_prompt = f\"{template['prefix']}{original_prompt}\"\n        \n        # Add suffix if it doesn't appear to already be present\n        if template[\"suffix\"] and template[\"suffix\"] not in optimized_prompt:\n            optimized_prompt += f\" {template['suffix']}\"\n        \n        return optimized_prompt\n    \n    def optimize_user_prompt(self, original_prompt: str, model: str, task_type: Optional[str] = None) -\u003e str:\n        \"\"\"Optimize the user prompt for the specific model and task.\"\"\"\n        if not original_prompt:\n            return original_prompt\n            \n        # Auto-detect task type if not provided\n        if not task_type:\n            task_type = self.detect_task_type(original_prompt)\n        \n        # Get model-specific template\n        template = self.model_specific_templates.get(model, self.default_template)\n        \n        # If task type is provided, incorporate task-specific optimizations\n        if task_type and task_type in self.task_templates:\n            task_template = self.task_templates[task_type]\n            # Use task instruction format if available\n            instruction_format = task_template[\"instruction_format\"]\n        else:\n            instruction_format = template[\"instruction_format\"]\n        \n        # Apply instruction format if the prompt doesn't already look formatted\n        if \"{instruction}\" in instruction_format and not re.match(r\"^(task|problem|prompt|question):\", original_prompt.lower()):\n            formatted_prompt = instruction_format.replace(\"{instruction}\", original_prompt)\n            return formatted_prompt\n        \n        return original_prompt\n    \n    def optimize_messages(self, messages: List[Dict[str, str]], model: str) -\u003e List[Dict[str, str]]:\n        \"\"\"Optimize all messages in a conversation for the specific model.\"\"\"\n        if not messages:\n            return messages\n            \n        # Try to detect task type from the user messages\n        task_type = None\n        for msg in messages:\n            if msg.get(\"role\") == \"user\" and msg.get(\"content\"):\n                detected_task = self.detect_task_type(msg[\"content\"])\n                if detected_task:\n                    task_type = detected_task\n                    break\n        \n        optimized = []\n        \n        for msg in messages:\n            role = msg.get(\"role\", \"\")\n            content = msg.get(\"content\", \"\")\n            \n            if role == \"system\" and content:\n                optimized_content = self.optimize_system_prompt(content, model, task_type)\n                optimized.append({\"role\": role, \"content\": optimized_content})\n            elif role == \"user\" and content:\n                optimized_content = self.optimize_user_prompt(content, model, task_type)\n                optimized.append({\"role\": role, \"content\": optimized_content})\n            else:\n                # Keep other messages unchanged\n                optimized.append(msg)\n        \n        return optimized\n"])</script><script>self.__next_f.push([1,"148:[\"$\",\"pre\",\"pre-91\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$312\"}]}]\n149:[\"$\",\"h2\",\"h2-60\",{\"id\":\"cost-reduction-strategies\",\"children\":\"Cost Reduction Strategies\"}]\n14a:[\"$\",\"h3\",\"h3-52\",{\"id\":\"1-token-usage-optimization\",\"children\":\"1. Token Usage Optimization\"}]\n313:T33ee,"])</script><script>self.__next_f.push([1,"# app/services/token_optimizer.py\nimport logging\nimport re\nfrom typing import List, Dict, Any, Optional, Tuple\nimport tiktoken\nimport numpy as np\n\nlogger = logging.getLogger(__name__)\n\nclass TokenOptimizer:\n    \"\"\"Optimizes token usage to reduce costs.\"\"\"\n    \n    def __init__(self):\n        # Load tokenizers once\n        try:\n            self.gpt3_tokenizer = tiktoken.encoding_for_model(\"gpt-3.5-turbo\")\n            self.gpt4_tokenizer = tiktoken.encoding_for_model(\"gpt-4\")\n        except Exception as e:\n            logger.warning(f\"Could not load tokenizers: {str(e)}. Falling back to approximate counting.\")\n            self.gpt3_tokenizer = None\n            self.gpt4_tokenizer = None\n    \n    def count_tokens(self, text: str, model: str = \"gpt-3.5-turbo\") -\u003e int:\n        \"\"\"Count the number of tokens in a text string for a specific model.\"\"\"\n        if not text:\n            return 0\n            \n        # Use appropriate tokenizer if available\n        if model.startswith(\"gpt-4\") and self.gpt4_tokenizer:\n            return len(self.gpt4_tokenizer.encode(text))\n        elif model.startswith(\"gpt-3\") and self.gpt3_tokenizer:\n            return len(self.gpt3_tokenizer.encode(text))\n        \n        # Fallback to approximation (~ 4 chars per token for English)\n        return len(text) // 4 + 1\n    \n    def count_message_tokens(self, messages: List[Dict[str, str]], model: str = \"gpt-3.5-turbo\") -\u003e int:\n        \"\"\"Count tokens in a full message array.\"\"\"\n        if not messages:\n            return 0\n            \n        total = 0\n        \n        # Different models have different message formatting overheads\n        if model.startswith(\"gpt-3.5-turbo\"):\n            # Per OpenAI's formula for message token counting\n            # See: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb\n            total += 3  # Every message follows \u003cim_start\u003e{role/name}\\n{content}\u003cim_end\u003e\\n\n            \n            for message in messages:\n                total += 3  # Role overhead\n                for key, value in message.items():\n                    if key == \"name\":  # Name is 1 token\n                        total += 1\n                    if key == \"content\" and value:\n                        total += self.count_tokens(value, model)\n            \n            total += 3  # Assistant response overhead\n            \n        elif model.startswith(\"gpt-4\"):\n            # Similar formula for GPT-4\n            total += 3\n            \n            for message in messages:\n                total += 3\n                for key, value in message.items():\n                    if key == \"name\":\n                        total += 1\n                    if key == \"content\" and value:\n                        total += self.count_tokens(value, model)\n            \n            total += 3\n            \n        else:\n            # Simple approach for other models \n            for message in messages:\n                content = message.get(\"content\", \"\")\n                if content:\n                    total += self.count_tokens(content, model)\n        \n        return total\n    \n    def truncate_messages(self, \n                         messages: List[Dict[str, str]], \n                         max_tokens: int, \n                         model: str = \"gpt-3.5-turbo\",\n                         preserve_system: bool = True,\n                         preserve_last_n_exchanges: int = 2) -\u003e List[Dict[str, str]]:\n        \"\"\"Truncate conversation history to fit within token limit.\"\"\"\n        if not messages:\n            return messages\n            \n        # Clone messages to avoid modifying the original\n        messages = [m.copy() for m in messages]\n        \n        current_tokens = self.count_message_tokens(messages, model)\n        \n        # If already under the limit, return as is\n        if current_tokens \u003c= max_tokens:\n            return messages\n        \n        # Identify system and user/assistant pairs\n        system_messages = [m for m in messages if m.get(\"role\") == \"system\"]\n        system_tokens = sum(self.count_tokens(m.get(\"content\", \"\"), model) for m in system_messages)\n        \n        # Extract exchanges (user followed by assistant message)\n        exchanges = []\n        current_exchange = []\n        \n        for m in messages:\n            if m.get(\"role\") == \"system\":\n                continue\n                \n            current_exchange.append(m)\n            \n            # If we have a user+assistant pair, add to exchanges and reset\n            if len(current_exchange) == 2 and current_exchange[0].get(\"role\") == \"user\" and current_exchange[1].get(\"role\") == \"assistant\":\n                exchanges.append(current_exchange)\n                current_exchange = []\n                \n        # Add any remaining messages\n        if current_exchange:\n            exchanges.append(current_exchange)\n        \n        # Calculate tokens needed for essential parts\n        essential_tokens = system_tokens if preserve_system else 0\n        \n        # Add tokens for the last N exchanges\n        last_n_exchanges = exchanges[-preserve_last_n_exchanges:] if exchanges else []\n        last_n_tokens = sum(\n            self.count_tokens(m.get(\"content\", \"\"), model) \n            for exchange in last_n_exchanges \n            for m in exchange\n        )\n        \n        essential_tokens += last_n_tokens\n        \n        # If essential parts already exceed the limit, we need more aggressive truncation\n        if essential_tokens \u003e max_tokens:\n            logger.warning(f\"Essential conversation parts exceed token limit: {essential_tokens} \u003e {max_tokens}\")\n            \n            # Start by keeping system messages if requested\n            result = system_messages.copy() if preserve_system else []\n            \n            # Add as many of the last exchanges as we can fit\n            remaining_tokens = max_tokens - sum(self.count_tokens(m.get(\"content\", \"\"), model) for m in result)\n            \n            for exchange in reversed(last_n_exchanges):\n                exchange_tokens = sum(self.count_tokens(m.get(\"content\", \"\"), model) for m in exchange)\n                \n                if exchange_tokens \u003c= remaining_tokens:\n                    result.extend(exchange)\n                    remaining_tokens -= exchange_tokens\n                else:\n                    # If we can't fit the whole exchange, try truncating the assistant response\n                    if len(exchange) == 2:\n                        user_msg = exchange[0]\n                        assistant_msg = exchange[1].copy()\n                        \n                        user_tokens = self.count_tokens(user_msg.get(\"content\", \"\"), model)\n                        \n                        if user_tokens \u003c remaining_tokens:\n                            # We can include the user message\n                            result.append(user_msg)\n                            remaining_tokens -= user_tokens\n                            \n                            # Truncate the assistant message to fit\n                            assistant_content = assistant_msg.get(\"content\", \"\")\n                            if assistant_content:\n                                # Simple truncation - in a real system, you'd want more intelligent truncation\n                                chars_to_keep = int(remaining_tokens * 4)  # Approximate char count\n                                truncated_content = assistant_content[:chars_to_keep] + \"... [truncated]\"\n                                assistant_msg[\"content\"] = truncated_content\n                                result.append(assistant_msg)\n                    \n                    break\n            \n            # Resort the messages to maintain the correct order\n            result.sort(key=lambda m: messages.index(m) if m in messages else 999999)\n            return result\n        \n        # If we get here, we can keep all essential parts and need to drop from the middle\n        result = system_messages.copy() if preserve_system else []\n        middle_exchanges = exchanges[:-preserve_last_n_exchanges] if len(exchanges) \u003e preserve_last_n_exchanges else []\n        \n        # Calculate how many tokens we can allocate to middle exchanges\n        remaining_tokens = max_tokens - essential_tokens\n        \n        # Add exchanges from the middle, newest first, until we run out of tokens\n        for exchange in reversed(middle_exchanges):\n            exchange_tokens = sum(self.count_tokens(m.get(\"content\", \"\"), model) for m in exchange)\n            \n            if exchange_tokens \u003c= remaining_tokens:\n                result.extend(exchange)\n                remaining_tokens -= exchange_tokens\n            else:\n                break\n        \n        # Add the preserved last exchanges\n        for exchange in last_n_exchanges:\n            result.extend(exchange)\n        \n        # Sort messages to maintain the correct order\n        result.sort(key=lambda m: messages.index(m) if m in messages else 999999)\n        \n        # Verify the result is within the token limit\n        final_tokens = self.count_message_tokens(result, model)\n        if final_tokens \u003e max_tokens:\n            logger.warning(f\"Truncation failed to meet target: {final_tokens} \u003e {max_tokens}\")\n        \n        return result\n    \n    def compress_system_prompt(self, system_prompt: str, max_tokens: int, model: str = \"gpt-3.5-turbo\") -\u003e str:\n        \"\"\"Compress a system prompt to use fewer tokens while preserving key information.\"\"\"\n        current_tokens = self.count_tokens(system_prompt, model)\n        \n        if current_tokens \u003c= max_tokens:\n            return system_prompt\n        \n        # Use a language model to compress the prompt\n        # In a real implementation, you might want to call an external service\n        \n        # Fallback compression strategy: Use text summarization techniques\n        # 1. Remove redundant phrases\n        redundant_phrases = [\n            \"Please note that\", \"It's important to remember that\", \"Keep in mind that\",\n            \"I want you to\", \"I'd like you to\", \"You should\", \"Make sure to\",\n            \"Always\", \"Never\", \"Remember to\"\n        ]\n        \n        compressed = system_prompt\n        for phrase in redundant_phrases:\n            compressed = compressed.replace(phrase, \"\")\n        \n        # 2. Replace verbose constructions with shorter ones\n        replacements = {\n            \"in order to\": \"to\",\n            \"for the purpose of\": \"for\",\n            \"due to the fact that\": \"because\",\n            \"in the event that\": \"if\",\n            \"on the condition that\": \"if\",\n            \"with regard to\": \"about\",\n            \"in relation to\": \"about\"\n        }\n        \n        for verbose, concise in replacements.items():\n            compressed = compressed.replace(verbose, concise)\n        \n        # 3. Remove unnecessary whitespace\n        compressed = re.sub(r'\\s+', ' ', compressed).strip()\n        \n        # 4. If still over the limit, truncate with an ellipsis\n        compressed_tokens = self.count_tokens(compressed, model)\n        if compressed_tokens \u003e max_tokens:\n            # Approximation: 4 characters per token\n            char_limit = max_tokens * 4\n            compressed = compressed[:char_limit] + \"...\"\n        \n        return compressed\n    \n    def optimize_messages_for_cost(self, \n                                 messages: List[Dict[str, str]], \n                                 model: str, \n                                 max_tokens: int = 4096) -\u003e List[Dict[str, str]]:\n        \"\"\"Fully optimize messages for cost efficiency.\"\"\"\n        if not messages:\n            return messages\n            \n        # 1. First, identify system messages for compression\n        system_messages = []\n        other_messages = []\n        \n        for msg in messages:\n            if msg.get(\"role\") == \"system\":\n                system_messages.append(msg)\n            else:\n                other_messages.append(msg)\n        \n        # 2. Compress system messages if there are multiple\n        if len(system_messages) \u003e 1:\n            # Combine multiple system messages\n            combined_content = \" \".join(msg.get(\"content\", \"\") for msg in system_messages)\n            compressed_content = self.compress_system_prompt(combined_content, 1024, model)\n            \n            # Replace with a single compressed message\n            system_messages = [{\"role\": \"system\", \"content\": compressed_content}]\n        elif len(system_messages) == 1 and self.count_tokens(system_messages[0].get(\"content\", \"\"), model) \u003e 1024:\n            # Compress a single long system message\n            system_messages[0][\"content\"] = self.compress_system_prompt(\n                system_messages[0].get(\"content\", \"\"), 1024, model\n            )\n        \n        # 3. Recombine and truncate the full conversation\n        optimized = system_messages + other_messages\n        reserved_completion_tokens = max(max_tokens // 4, 1024)  # Reserve 25% or at least 1024 tokens for completion\n        max_prompt_tokens = max_tokens - reserved_completion_tokens\n        \n        return self.truncate_messages(\n            optimized, \n            max_prompt_tokens, \n            model,\n            preserve_system=True,\n            preserve_last_n_exchanges=2\n        )\n"])</script><script>self.__next_f.push([1,"14b:[\"$\",\"pre\",\"pre-92\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$313\"}]}]\n14c:[\"$\",\"h3\",\"h3-53\",{\"id\":\"2-model-tier-selection\",\"children\":\"2. Model Tier Selection\"}]\n314:T26af,"])</script><script>self.__next_f.push([1,"# app/services/model_tier_service.py\nimport logging\nfrom typing import Dict, List, Any, Optional, Tuple\nimport re\nimport time\n\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nclass ModelTierService:\n    \"\"\"Selects the appropriate model tier based on task requirements and budget constraints.\"\"\"\n    \n    def __init__(self):\n        # Cost per 1000 tokens for different models (approximate)\n        self.model_costs = {\n            # OpenAI models input/output costs\n            \"gpt-4\": {\"input\": 0.03, \"output\": 0.06},\n            \"gpt-4-32k\": {\"input\": 0.06, \"output\": 0.12},\n            \"gpt-4-turbo\": {\"input\": 0.01, \"output\": 0.03},\n            \"gpt-3.5-turbo\": {\"input\": 0.0015, \"output\": 0.002},\n            \"gpt-3.5-turbo-16k\": {\"input\": 0.003, \"output\": 0.004},\n            \n            # Ollama models (local, so effectively zero API cost)\n            \"llama2\": {\"input\": 0, \"output\": 0},\n            \"mistral\": {\"input\": 0, \"output\": 0},\n            \"codellama\": {\"input\": 0, \"output\": 0}\n        }\n        \n        # Model capabilities and appropriate use cases\n        self.model_capabilities = {\n            \"gpt-4\": [\"complex_reasoning\", \"creative\", \"code\", \"math\", \"general\"],\n            \"gpt-4-turbo\": [\"complex_reasoning\", \"creative\", \"code\", \"math\", \"general\"],\n            \"gpt-3.5-turbo\": [\"simple_reasoning\", \"general\", \"summarization\"],\n            \"llama2\": [\"simple_reasoning\", \"general\", \"summarization\"],\n            \"mistral\": [\"simple_reasoning\", \"general\", \"creative\"],\n            \"codellama\": [\"code\"]\n        }\n        \n        # Default model selections for different task types\n        self.task_model_mapping = {\n            \"complex_reasoning\": {\n                \"high\": \"gpt-4-turbo\",\n                \"medium\": \"gpt-4-turbo\",\n                \"low\": \"gpt-3.5-turbo\"\n            },\n            \"simple_reasoning\": {\n                \"high\": \"gpt-3.5-turbo\",\n                \"medium\": \"gpt-3.5-turbo\",\n                \"low\": \"mistral\"\n            },\n            \"creative\": {\n                \"high\": \"gpt-4-turbo\",\n                \"medium\": \"mistral\",\n                \"low\": \"mistral\"\n            },\n            \"code\": {\n                \"high\": \"gpt-4-turbo\",\n                \"medium\": \"codellama\",\n                \"low\": \"codellama\"\n            },\n            \"math\": {\n                \"high\": \"gpt-4-turbo\",\n                \"medium\": \"gpt-3.5-turbo\",\n                \"low\": \"mistral\"\n            },\n            \"general\": {\n                \"high\": \"gpt-3.5-turbo\",\n                \"medium\": \"mistral\",\n                \"low\": \"llama2\"\n            },\n            \"summarization\": {\n                \"high\": \"gpt-3.5-turbo\",\n                \"medium\": \"mistral\",\n                \"low\": \"llama2\"\n            }\n        }\n        \n        # Budget tier thresholds - what percentage of budget is remaining?\n        self.budget_tiers = {\n            \"high\": 0.6,    # \u003e60% of budget remaining\n            \"medium\": 0.3,  # 30-60% of budget remaining\n            \"low\": 0.0      # \u003c30% of budget remaining\n        }\n        \n        # Initialize usage tracking\n        self.monthly_budget = settings.MONTHLY_BUDGET\n        self.usage_this_month = 0\n        self.month_start_timestamp = self._get_month_start_timestamp()\n    \n    def _get_month_start_timestamp(self) -\u003e int:\n        \"\"\"Get timestamp for the start of the current month.\"\"\"\n        import datetime\n        now = datetime.datetime.now()\n        month_start = datetime.datetime(now.year, now.month, 1)\n        return int(month_start.timestamp())\n    \n    def detect_task_type(self, query: str) -\u003e str:\n        \"\"\"Detect the type of task from the query.\"\"\"\n        query_lower = query.lower()\n        \n        # Check for code-related tasks\n        code_indicators = [\n            \"code\", \"function\", \"program\", \"algorithm\", \"javascript\", \n            \"python\", \"java\", \"c++\", \"typescript\", \"html\", \"css\"\n        ]\n        if any(indicator in query_lower for indicator in code_indicators):\n            return \"code\"\n        \n        # Check for math problems\n        math_indicators = [\n            \"calculate\", \"solve\", \"equation\", \"math problem\", \"compute\",\n            \"derivative\", \"integral\", \"algebra\", \"calculus\", \"arithmetic\"\n        ]\n        if any(indicator in query_lower for indicator in math_indicators):\n            return \"math\"\n        \n        # Check for creative tasks\n        creative_indicators = [\n            \"story\", \"poem\", \"creative\", \"imagine\", \"fiction\", \"fantasy\",\n            \"character\", \"novel\", \"script\", \"narrative\", \"write a\"\n        ]\n        if any(indicator in query_lower for indicator in creative_indicators):\n            return \"creative\"\n        \n        # Check for complex reasoning\n        complex_indicators = [\n            \"analyze\", \"critique\", \"evaluate\", \"compare and contrast\",\n            \"implications\", \"consequences\", \"recommend\", \"strategy\",\n            \"detailed explanation\", \"comprehensive\", \"thorough\"\n        ]\n        if any(indicator in query_lower for indicator in complex_indicators):\n            return \"complex_reasoning\"\n        \n        # Check for summarization\n        summary_indicators = [\n            \"summarize\", \"summary\", \"tldr\", \"briefly explain\", \"short version\",\n            \"key points\", \"main ideas\"\n        ]\n        if any(indicator in query_lower for indicator in summary_indicators):\n            return \"summarization\"\n        \n        # Default to simple reasoning if no specific category is detected\n        simple_indicators = [\n            \"explain\", \"how\", \"why\", \"what\", \"when\", \"who\", \"where\",\n            \"help me understand\", \"tell me about\"\n        ]\n        if any(indicator in query_lower for indicator in simple_indicators):\n            return \"simple_reasoning\"\n        \n        # Fallback to general\n        return \"general\"\n    \n    def get_current_budget_tier(self) -\u003e str:\n        \"\"\"Get the current budget tier based on monthly usage.\"\"\"\n        # Check if we're in a new month\n        current_month_start = self._get_month_start_timestamp()\n        if current_month_start \u003e self.month_start_timestamp:\n            # Reset for new month\n            self.month_start_timestamp = current_month_start\n            self.usage_this_month = 0\n        \n        if self.monthly_budget \u003c= 0:\n            # No budget constraints\n            return \"high\"\n        \n        # Calculate remaining budget percentage\n        remaining_percentage = 1 - (self.usage_this_month / self.monthly_budget)\n        \n        # Determine tier\n        if remaining_percentage \u003e self.budget_tiers[\"high\"]:\n            return \"high\"\n        elif remaining_percentage \u003e self.budget_tiers[\"medium\"]:\n            return \"medium\"\n        else:\n            return \"low\"\n    \n    def record_usage(self, model: str, input_tokens: int, output_tokens: int) -\u003e None:\n        \"\"\"Record token usage for budget tracking.\"\"\"\n        if model not in self.model_costs:\n            return\n        \n        costs = self.model_costs[model]\n        input_cost = (input_tokens / 1000) * costs[\"input\"]\n        output_cost = (output_tokens / 1000) * costs[\"output\"]\n        total_cost = input_cost + output_cost\n        \n        self.usage_this_month += total_cost\n        \n        # Log for monitoring\n        logger.info(f\"Usage recorded: {model}, {input_tokens} input tokens, {output_tokens} output tokens, ${total_cost:.4f}\")\n    \n    def select_optimal_model(self, \n                           query: str, \n                           preferred_provider: Optional[str] = None,\n                           force_tier: Optional[str] = None) -\u003e Tuple[str, str]:\n        \"\"\"\n        Select the optimal model based on the query and budget constraints.\n        Returns a tuple of (provider, model)\n        \"\"\"\n        # Detect task type\n        task_type = self.detect_task_type(query)\n        \n        # Get budget tier (unless forced)\n        budget_tier = force_tier if force_tier else self.get_current_budget_tier()\n        \n        # Get the recommended model for this task and budget tier\n        recommended_model = self.task_model_mapping[task_type][budget_tier]\n        \n        # Determine provider based on model\n        if recommended_model in [\"llama2\", \"mistral\", \"codellama\"]:\n            provider = \"ollama\"\n        else:\n            provider = \"openai\"\n        \n        # Override provider if specified and compatible\n        if preferred_provider:\n            if preferred_provider == \"ollama\" and provider == \"openai\":\n                # Find an Ollama alternative for this task\n                for model, capabilities in self.model_capabilities.items():\n                    if task_type in capabilities and model in [\"llama2\", \"mistral\", \"codellama\"]:\n                        recommended_model = model\n                        provider = \"ollama\"\n                        break\n            elif preferred_provider == \"openai\" and provider == \"ollama\":\n                # Find an OpenAI alternative for this task\n                for model, capabilities in self.model_capabilities.items():\n                    if task_type in capabilities and model not in [\"llama2\", \"mistral\", \"codellama\"]:\n                        recommended_model = model\n                        provider = \"openai\"\n                        break\n        \n        logger.info(f\"Selected model for task '{task_type}' (tier: {budget_tier}): {provider}:{recommended_model}\")\n        return provider, recommended_model\n    \n    def estimate_cost(self, model: str, input_tokens: int, expected_output_tokens: int) -\u003e float:\n        \"\"\"Estimate the cost of a request.\"\"\"\n        if model not in self.model_costs:\n            return 0.0\n        \n        costs = self.model_costs[model]\n        input_cost = (input_tokens / 1000) * costs[\"input\"]\n        output_cost = (expected_output_tokens / 1000) * costs[\"output\"]\n        \n        return input_cost + output_cost\n"])</script><script>self.__next_f.push([1,"14d:[\"$\",\"pre\",\"pre-93\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$314\"}]}]\n14e:[\"$\",\"h3\",\"h3-54\",{\"id\":\"3-local-model-prioritization-for-development\",\"children\":\"3. Local Model Prioritization for Development\"}]\n315:Tee9,"])</script><script>self.__next_f.push([1,"# app/services/dev_mode_service.py\nimport logging\nimport os\nfrom typing import Dict, List, Any, Optional\nimport re\n\nlogger = logging.getLogger(__name__)\n\nclass DevModeService:\n    \"\"\"\n    Service that prioritizes local models during development to reduce costs.\n    \"\"\"\n    \n    def __init__(self):\n        # Read environment to determine if we're in development mode\n        self.is_dev_mode = os.environ.get(\"APP_ENV\", \"development\").lower() == \"development\"\n        self.dev_mode_forced = os.environ.get(\"FORCE_DEV_MODE\", \"false\").lower() == \"true\"\n        \n        # Set up developer-focused settings\n        self.allow_openai_for_patterns = [\n            r\"(complex|sophisticated|advanced)\\s+(reasoning|analysis)\",\n            r\"(gpt-4|gpt-3\\.5|openai)\"  # Explicit requests for OpenAI models\n        ]\n        \n        self.use_ollama_for_patterns = [\n            r\"^test\\s\",  # Queries starting with \"test\"\n            r\"^debug\\s\",  # Debugging queries\n            r\"^hello\\s\",  # Simple greetings\n            r\"^hi\\s\",\n            r\"^try\\s\"\n        ]\n        \n        # Track usage for reporting\n        self.openai_requests = 0\n        self.ollama_requests = 0\n        self.redirected_requests = 0\n    \n    def is_development_environment(self) -\u003e bool:\n        \"\"\"Check if we're running in a development environment.\"\"\"\n        return self.is_dev_mode or self.dev_mode_forced\n    \n    def should_use_local_model(self, query: str) -\u003e bool:\n        \"\"\"\n        Determine if a query should use local models in development mode.\n        In development, we default to local models unless specific patterns are matched.\n        \"\"\"\n        if not self.is_development_environment():\n            return False\n        \n        # Always use local models for specific patterns\n        for pattern in self.use_ollama_for_patterns:\n            if re.search(pattern, query, re.IGNORECASE):\n                return True\n        \n        # Allow OpenAI for specific advanced patterns\n        for pattern in self.allow_openai_for_patterns:\n            if re.search(pattern, query, re.IGNORECASE):\n                return False\n        \n        # In development, default to local models to save costs\n        return True\n    \n    def get_dev_routing_decision(self, query: str, default_provider: str) -\u003e str:\n        \"\"\"\n        Make a routing decision based on development mode settings.\n        Returns: \"openai\" or \"ollama\"\n        \"\"\"\n        if not self.is_development_environment():\n            return default_provider\n        \n        should_use_local = self.should_use_local_model(query)\n        \n        # Track for reporting\n        if should_use_local:\n            self.ollama_requests += 1\n            if default_provider == \"openai\":\n                self.redirected_requests += 1\n        else:\n            self.openai_requests += 1\n        \n        return \"ollama\" if should_use_local else \"openai\"\n    \n    def get_usage_report(self) -\u003e Dict[str, Any]:\n        \"\"\"Get a report of usage patterns for monitoring costs.\"\"\"\n        total_requests = self.openai_requests + self.ollama_requests\n        \n        if total_requests == 0:\n            ollama_percentage = 0\n            redirected_percentage = 0\n        else:\n            ollama_percentage = (self.ollama_requests / total_requests) * 100\n            redirected_percentage = (self.redirected_requests / total_requests) * 100\n        \n        return {\n            \"dev_mode_active\": self.is_development_environment(),\n            \"total_requests\": total_requests,\n            \"openai_requests\": self.openai_requests,\n            \"ollama_requests\": self.ollama_requests,\n            \"redirected_to_ollama\": self.redirected_requests,\n            \"ollama_usage_percentage\": ollama_percentage,\n            \"cost_savings_percentage\": redirected_percentage\n        }\n"])</script><script>self.__next_f.push([1,"14f:[\"$\",\"pre\",\"pre-94\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$315\"}]}]\n150:[\"$\",\"h3\",\"h3-55\",{\"id\":\"4-request-batching-and-rate-limiting\",\"children\":\"4. Request Batching and Rate Limiting\"}]\n316:T2451,"])</script><script>self.__next_f.push([1,"# app/services/rate_limiter.py\nimport time\nimport asyncio\nimport logging\nfrom typing import Dict, List, Any, Optional, Callable, Awaitable\nfrom collections import defaultdict\nimport redis.asyncio as redis\n\nfrom app.config import settings\n\nlogger = logging.getLogger(__name__)\n\nclass RateLimiter:\n    \"\"\"\n    Rate limiter to control API usage and costs.\n    Implements tiered rate limiting based on user roles.\n    \"\"\"\n    \n    def __init__(self):\n        self.redis = None\n        \n        # Rate limit tiers (requests per time window)\n        self.rate_limit_tiers = {\n            \"free\": {\n                \"minute\": 5,\n                \"hour\": 20,\n                \"day\": 100\n            },\n            \"basic\": {\n                \"minute\": 20,\n                \"hour\": 100,\n                \"day\": 1000\n            },\n            \"premium\": {\n                \"minute\": 60,\n                \"hour\": 1000,\n                \"day\": 10000\n            },\n            \"enterprise\": {\n                \"minute\": 120,\n                \"hour\": 5000,\n                \"day\": 50000\n            }\n        }\n        \n        # Provider-specific rate limits (global)\n        self.provider_rate_limits = {\n            \"openai\": {\n                \"minute\": 60,  # Shared across all users\n                \"tokens_per_minute\": 90000  # Token budget per minute\n            },\n            \"ollama\": {\n                \"minute\": 100,  # Higher for local models\n                \"tokens_per_minute\": 250000\n            }\n        }\n        \n        # Tracking for available token budgets\n        self.token_budgets = {\n            \"openai\": self.provider_rate_limits[\"openai\"][\"tokens_per_minute\"],\n            \"ollama\": self.provider_rate_limits[\"ollama\"][\"tokens_per_minute\"]\n        }\n        self.last_budget_reset = time.time()\n    \n    async def initialize(self):\n        \"\"\"Initialize Redis connection.\"\"\"\n        self.redis = await redis.from_url(settings.REDIS_URL)\n        \n        # Start token budget replenishment task\n        asyncio.create_task(self._token_budget_replenishment())\n    \n    async def _token_budget_replenishment(self):\n        \"\"\"Periodically replenish token budgets.\"\"\"\n        while True:\n            try:\n                now = time.time()\n                elapsed = now - self.last_budget_reset\n                \n                # Reset every minute\n                if elapsed \u003e= 60:\n                    self.token_budgets = {\n                        \"openai\": self.provider_rate_limits[\"openai\"][\"tokens_per_minute\"],\n                        \"ollama\": self.provider_rate_limits[\"ollama\"][\"tokens_per_minute\"]\n                    }\n                    self.last_budget_reset = now\n                \n                # Partial replenishment for less than a minute\n                else:\n                    # Calculate replenishment based on elapsed time\n                    openai_replenishment = int((elapsed / 60) * self.provider_rate_limits[\"openai\"][\"tokens_per_minute\"])\n                    ollama_replenishment = int((elapsed / 60) * self.provider_rate_limits[\"ollama\"][\"tokens_per_minute\"])\n                    \n                    # Replenish up to max\n                    self.token_budgets[\"openai\"] = min(\n                        self.token_budgets[\"openai\"] + openai_replenishment,\n                        self.provider_rate_limits[\"openai\"][\"tokens_per_minute\"]\n                    )\n                    self.token_budgets[\"ollama\"] = min(\n                        self.token_budgets[\"ollama\"] + ollama_replenishment,\n                        self.provider_rate_limits[\"ollama\"][\"tokens_per_minute\"]\n                    )\n                    \n                    self.last_budget_reset = now\n            except Exception as e:\n                logger.error(f\"Error in token budget replenishment: {str(e)}\")\n            \n            # Update every 5 seconds\n            await asyncio.sleep(5)\n    \n    async def check_rate_limit(self, \n                             user_id: str, \n                             tier: str = \"free\",\n                             provider: str = \"openai\") -\u003e Dict[str, Any]:\n        \"\"\"\n        Check if a request is within rate limits.\n        Returns: {\"allowed\": bool, \"retry_after\": Optional[int], \"reason\": Optional[str]}\n        \"\"\"\n        if not self.redis:\n            # If Redis is not available, allow the request but log a warning\n            logger.warning(\"Redis not available for rate limiting\")\n            return {\"allowed\": True}\n        \n        # Get rate limits for this user's tier\n        tier_limits = self.rate_limit_tiers.get(tier, self.rate_limit_tiers[\"free\"])\n        \n        # Check user-specific rate limits\n        for window, limit in tier_limits.items():\n            key = f\"rate:user:{user_id}:{window}\"\n            \n            # Get current count\n            count = await self.redis.get(key)\n            count = int(count) if count else 0\n            \n            if count \u003e= limit:\n                ttl = await self.redis.ttl(key)\n                return {\n                    \"allowed\": False,\n                    \"retry_after\": max(1, ttl),\n                    \"reason\": f\"Rate limit exceeded for {window}\"\n                }\n        \n        # Check provider-specific rate limits\n        provider_limits = self.provider_rate_limits.get(provider, {})\n        if \"minute\" in provider_limits:\n            provider_key = f\"rate:provider:{provider}:minute\"\n            provider_count = await self.redis.get(provider_key)\n            provider_count = int(provider_count) if provider_count else 0\n            \n            if provider_count \u003e= provider_limits[\"minute\"]:\n                ttl = await self.redis.ttl(provider_key)\n                return {\n                    \"allowed\": False,\n                    \"retry_after\": max(1, ttl),\n                    \"reason\": f\"Global {provider} rate limit exceeded\"\n                }\n        \n        # Check token budget\n        if provider in self.token_budgets and self.token_budgets[provider] \u003c= 0:\n            # Calculate time until next budget refresh\n            time_since_reset = time.time() - self.last_budget_reset\n            time_until_refresh = max(1, int(60 - time_since_reset))\n            \n            return {\n                \"allowed\": False,\n                \"retry_after\": time_until_refresh,\n                \"reason\": f\"{provider} token budget exhausted\"\n            }\n        \n        # All checks passed\n        return {\"allowed\": True}\n    \n    async def increment_counters(self, \n                               user_id: str, \n                               provider: str, \n                               token_count: int = 0) -\u003e None:\n        \"\"\"Increment rate limit counters after a successful request.\"\"\"\n        if not self.redis:\n            return\n        \n        now = int(time.time())\n        \n        # Increment user counters for different windows\n        pipeline = self.redis.pipeline()\n        \n        # Minute window (expires in 60 seconds)\n        minute_key = f\"rate:user:{user_id}:minute\"\n        pipeline.incr(minute_key)\n        pipeline.expireat(minute_key, now + 60)\n        \n        # Hour window (expires in 3600 seconds)\n        hour_key = f\"rate:user:{user_id}:hour\"\n        pipeline.incr(hour_key)\n        pipeline.expireat(hour_key, now + 3600)\n        \n        # Day window (expires in 86400 seconds)\n        day_key = f\"rate:user:{user_id}:day\"\n        pipeline.incr(day_key)\n        pipeline.expireat(day_key, now + 86400)\n        \n        # Increment provider counter\n        provider_key = f\"rate:provider:{provider}:minute\"\n        pipeline.incr(provider_key)\n        pipeline.expireat(provider_key, now + 60)\n        \n        # Execute all commands\n        await pipeline.execute()\n        \n        # Decrement token budget\n        if provider in self.token_budgets and token_count \u003e 0:\n            self.token_budgets[provider] = max(0, self.token_budgets[provider] - token_count)\n    \n    async def get_user_usage(self, user_id: str) -\u003e Dict[str, Any]:\n        \"\"\"Get current usage statistics for a user.\"\"\"\n        if not self.redis:\n            return {\n                \"minute\": 0,\n                \"hour\": 0,\n                \"day\": 0\n            }\n        \n        pipeline = self.redis.pipeline()\n        \n        # Get counts for all windows\n        pipeline.get(f\"rate:user:{user_id}:minute\")\n        pipeline.get(f\"rate:user:{user_id}:hour\")\n        pipeline.get(f\"rate:user:{user_id}:day\")\n        \n        # Get TTLs (time remaining)\n        pipeline.ttl(f\"rate:user:{user_id}:minute\")\n        pipeline.ttl(f\"rate:user:{user_id}:hour\")\n        pipeline.ttl(f\"rate:user:{user_id}:day\")\n        \n        results = await pipeline.execute()\n        \n        return {\n            \"minute\": {\n                \"usage\": int(results[0]) if results[0] else 0,\n                \"reset_in\": results[3] if results[3] and results[3] \u003e 0 else 60\n            },\n            \"hour\": {\n                \"usage\": int(results[1]) if results[1] else 0,\n                \"reset_in\": results[4] if results[4] and results[4] \u003e 0 else 3600\n            },\n            \"day\": {\n                \"usage\": int(results[2]) if results[2] else 0,\n                \"reset_in\": results[5] if results[5] and results[5] \u003e 0 else 86400\n            }\n        }\n"])</script><script>self.__next_f.push([1,"151:[\"$\",\"pre\",\"pre-95\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$316\"}]}]\n152:[\"$\",\"h3\",\"h3-56\",{\"id\":\"5-memory-and-context-compression\",\"children\":\"5. Memory and Context Compression\"}]\n317:T1cce,"])</script><script>self.__next_f.push([1,"# app/services/context_compression.py\nimport logging\nfrom typing import List, Dict, Any, Optional\nimport re\nimport json\n\nlogger = logging.getLogger(__name__)\n\nclass ContextCompressor:\n    \"\"\"\n    Compresses conversation history to reduce token usage while preserving context.\n    \"\"\"\n    \n    def __init__(self):\n        self.max_summary_tokens = 300  # Target size for summaries\n    \n    async def compress_history(self, \n                             messages: List[Dict[str, str]],\n                             provider_service: Any) -\u003e List[Dict[str, str]]:\n        \"\"\"\n        Compress conversation history by summarizing older exchanges.\n        Returns a new message list with compressed history.\n        \"\"\"\n        # If fewer than 4 messages (system + maybe 1-2 exchanges), no compression needed\n        if len(messages) \u003c 4:\n            return messages.copy()\n        \n        # Extract system message\n        system_messages = [m for m in messages if m.get(\"role\") == \"system\"]\n        \n        # Find the cut point - we'll preserve the most recent exchanges\n        if len(messages) \u003c= 10:\n            # For shorter conversations, keep the most recent 3 messages (1-2 exchanges)\n            preserve_count = 3\n            compress_messages = messages[:-preserve_count]\n            preserve_messages = messages[-preserve_count:]\n        else:\n            # For longer conversations, preserve the most recent 4-6 messages (2-3 exchanges)\n            preserve_count = min(6, max(4, len(messages) // 5))\n            compress_messages = messages[:-preserve_count]\n            preserve_messages = messages[-preserve_count:]\n        \n        # No system message in the compression list\n        compress_messages = [m for m in compress_messages if m.get(\"role\") != \"system\"]\n        \n        # If nothing to compress, return original\n        if not compress_messages:\n            return messages.copy()\n        \n        # Generate summary of the earlier conversation\n        summary = await self._generate_conversation_summary(compress_messages, provider_service)\n        \n        # Create a new message list with the summary + preserved messages\n        result = system_messages.copy()  # Start with system message(s)\n        \n        # Add summary as a system message\n        if summary:\n            result.append({\n                \"role\": \"system\",\n                \"content\": f\"Previous conversation summary: {summary}\"\n            })\n        \n        # Add preserved recent messages\n        result.extend(preserve_messages)\n        \n        return result\n    \n    async def _generate_conversation_summary(self, \n                                          messages: List[Dict[str, str]], \n                                          provider_service: Any) -\u003e str:\n        \"\"\"Generate a summary of the conversation history.\"\"\"\n        if not messages:\n            return \"\"\n        \n        # Format the conversation for summarization\n        conversation_text = \"\\n\".join([\n            f\"{m.get('role', 'unknown')}: {m.get('content', '')}\" \n            for m in messages if m.get('content')\n        ])\n        \n        # Prepare the summarization prompt\n        summary_prompt = [\n            {\"role\": \"system\", \"content\": \n                \"You are a conversation summarizer. Create a concise summary of the key points \"\n                \"from the conversation that would help maintain context for future responses. \"\n                \"Focus on important information, user preferences, and outstanding questions. \"\n                \"Keep the summary under 200 words.\"\n            },\n            {\"role\": \"user\", \"content\": f\"Summarize this conversation:\\n\\n{conversation_text}\"}\n        ]\n        \n        # Get a summary using a smaller/faster model\n        try:\n            summary_response = await provider_service.generate_completion(\n                messages=summary_prompt,\n                provider=\"openai\",  # Use OpenAI for reliability\n                model=\"gpt-3.5-turbo\",  # Use a smaller model for efficiency\n                max_tokens=self.max_summary_tokens\n            )\n            \n            if summary_response and summary_response.get(\"message\", {}).get(\"content\"):\n                return summary_response[\"message\"][\"content\"]\n            \n        except Exception as e:\n            logger.error(f\"Error generating conversation summary: {str(e)}\")\n            \n            # Simple fallback summary generation\n            topics = self._extract_topics(conversation_text)\n            if topics:\n                return f\"Previous conversation covered: {', '.join(topics)}.\"\n        \n        return \"The conversation covered various topics which have been summarized to save space.\"\n    \n    def _extract_topics(self, conversation_text: str) -\u003e List[str]:\n        \"\"\"Simple topic extraction as a fallback mechanism.\"\"\"\n        # Extract potential topic indicators\n        topic_phrases = [\n            \"discussed\", \"talked about\", \"mentioned\", \"referred to\",\n            \"asked about\", \"inquired about\", \"wanted to know\"\n        ]\n        \n        topics = []\n        \n        for phrase in topic_phrases:\n            pattern = rf\"{phrase} ([^\\.,:;]+)\"\n            matches = re.findall(pattern, conversation_text, re.IGNORECASE)\n            topics.extend(matches)\n        \n        # Deduplicate and limit\n        unique_topics = list(set(topics))\n        return unique_topics[:5]  # Return at most 5 topics\n    \n    async def compress_user_query(self,\n                               original_query: str,\n                               provider_service: Any) -\u003e str:\n        \"\"\"\n        Compress a long user query to reduce token usage while preserving intent.\n        Used for very long inputs.\n        \"\"\"\n        # If query is already reasonably sized, return as is\n        if len(original_query.split()) \u003c 100:\n            return original_query\n            \n        # Prepare compression prompt\n        compression_prompt = [\n            {\"role\": \"system\", \"content\": \n                \"You are a query optimizer. Your job is to reformulate user queries to be more \"\n                \"concise while preserving the core intent and all critical details. \"\n                \"Remove redundant information and excessive elaboration, but maintain all \"\n                \"specific requirements, constraints, and examples provided.\"\n            },\n            {\"role\": \"user\", \"content\": f\"Optimize this query to be more concise while preserving all important details:\\n\\n{original_query}\"}\n        ]\n        \n        # Get a compressed query\n        try:\n            compression_response = await provider_service.generate_completion(\n                messages=compression_prompt,\n                provider=\"openai\",\n                model=\"gpt-3.5-turbo\",\n                max_tokens=len(original_query.split()) // 2  # Target ~50% reduction\n            )\n            \n            if (compression_response and \n                compression_response.get(\"message\", {}).get(\"content\") and\n                len(compression_response[\"message\"][\"content\"]) \u003c len(original_query)):\n                return compression_response[\"message\"][\"content\"]\n                \n        except Exception as e:\n            logger.error(f\"Error compressing user query: {str(e)}\")\n        \n        # If compression fails or doesn't reduce size, return original\n        return original_query\n"])</script><script>self.__next_f.push([1,"153:[\"$\",\"pre\",\"pre-96\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$317\"}]}]\n154:[\"$\",\"h2\",\"h2-61\",{\"id\":\"response-accuracy-optimization-strategies\",\"children\":\"Response Accuracy Optimization Strategies\"}]\n155:[\"$\",\"h3\",\"h3-57\",{\"id\":\"1-prompt-engineering-templates\",\"children\":\"1. Prompt Engineering Templates\"}]\n318:T1ff7,"])</script><script>self.__next_f.push([1,"# app/services/prompt_templates.py\nfrom typing import Dict, List, Any, Optional\nimport re\n\nclass PromptTemplates:\n    \"\"\"\n    Provides optimized prompt templates for different use cases to improve response accuracy.\n    \"\"\"\n    \n    def __init__(self):\n        # Core system prompt templates\n        self.system_templates = {\n            \"general\": \"\"\"\n                You are a helpful assistant with diverse knowledge and capabilities.\n                Provide accurate, relevant, and concise responses to user queries.\n                When you don't know something, admit it rather than making up information.\n                Format your responses clearly using markdown when helpful.\n            \"\"\",\n            \n            \"coding\": \"\"\"\n                You are a coding assistant with expertise in programming languages and software development.\n                Provide correct, efficient, and well-documented code examples.\n                Explain your code clearly and highlight important concepts.\n                Format code blocks using markdown with appropriate syntax highlighting.\n                Suggest best practices and consider edge cases in your solutions.\n            \"\"\",\n            \n            \"research\": \"\"\"\n                You are a research assistant with access to broad knowledge.\n                Provide comprehensive, accurate, and nuanced information.\n                Consider different perspectives and cite limitations of your knowledge.\n                Structure complex information clearly and logically.\n                Indicate uncertainty when appropriate rather than speculating.\n            \"\"\",\n            \n            \"math\": \"\"\"\n                You are a mathematics tutor with expertise in various mathematical domains.\n                Provide step-by-step explanations for mathematical problems.\n                Use clear notation and formatting for equations using markdown.\n                Verify your solutions and check for errors or edge cases.\n                When solving problems, explain the underlying concepts and techniques.\n            \"\"\",\n            \n            \"creative\": \"\"\"\n                You are a creative assistant skilled in writing, storytelling, and idea generation.\n                Provide original, engaging, and imaginative content based on user requests.\n                Consider tone, style, and audience in your creative work.\n                When generating stories or content, maintain internal consistency.\n                Respect copyright and avoid plagiarizing existing creative works.\n            \"\"\"\n        }\n        \n        # Task-specific prompt templates that can be inserted into system prompts\n        self.task_templates = {\n            \"step_by_step\": \"\"\"\n                Break down your explanation into clear, logical steps.\n                Begin with foundational concepts before advancing to more complex ideas.\n                Use numbered or bulleted lists for sequential instructions or key points.\n                Provide examples to illustrate abstract concepts.\n            \"\"\",\n            \n            \"comparison\": \"\"\"\n                Present a balanced and objective comparison.\n                Identify clear categories for comparison (features, performance, use cases, etc.).\n                Highlight both similarities and differences.\n                Consider context and specific use cases in your evaluation.\n                Avoid unjustified bias and present evidence for evaluative statements.\n            \"\"\",\n            \n            \"factual_accuracy\": \"\"\"\n                Prioritize accuracy over comprehensiveness.\n                Clearly distinguish between well-established facts, expert consensus, and speculation.\n                Acknowledge limitations in your knowledge, especially for time-sensitive information.\n                Avoid overgeneralizations and recognize exceptions where relevant.\n            \"\"\",\n            \n            \"technical_explanation\": \"\"\"\n                Begin with a high-level overview before diving into technical details.\n                Define specialized terminology when introduced.\n                Use analogies to explain complex concepts when appropriate.\n                Balance technical precision with accessibility based on the apparent expertise level of the user.\n            \"\"\"\n        }\n        \n        # Output format templates\n        self.format_templates = {\n            \"pros_cons\": \"\"\"\n                Structure your response with clearly labeled sections for advantages and disadvantages.\n                Use bullet points or numbered lists for each point.\n                Consider different perspectives or use cases.\n                If applicable, provide a balanced conclusion or recommendation.\n            \"\"\",\n            \n            \"academic\": \"\"\"\n                Structure your response similar to an academic paper with introduction, body, and conclusion.\n                Use formal language and precise terminology.\n                Acknowledge limitations and alternative viewpoints.\n                Refer to theoretical frameworks or methodologies where relevant.\n            \"\"\",\n            \n            \"tutorial\": \"\"\"\n                Structure your response as a tutorial with clear sections:\n                - Introduction explaining what will be covered and prerequisites\n                - Step-by-step instructions with examples\n                - Common pitfalls or troubleshooting tips\n                - Summary of key takeaways\n                Use headings and code blocks with appropriate formatting.\n            \"\"\",\n            \n            \"eli5\": \"\"\"\n                Explain the concept as if to a 10-year-old with no specialized knowledge.\n                Use simple language and concrete analogies.\n                Break complex ideas into simple components.\n                Avoid jargon, or define terms very clearly when they must be used.\n            \"\"\"\n        }\n    \n    def get_system_prompt(self, category: str, include_tasks: List[str] = None) -\u003e str:\n        \"\"\"Get a system prompt template with optional task-specific additions.\"\"\"\n        base_template = self.system_templates.get(\n            category, \n            self.system_templates[\"general\"]\n        ).strip()\n        \n        if not include_tasks:\n            return base_template\n        \n        # Add selected task templates\n        task_additions = []\n        for task in include_tasks:\n            if task in self.task_templates:\n                task_additions.append(self.task_templates[task].strip())\n        \n        if task_additions:\n            combined = base_template + \"\\n\\n\" + \"\\n\\n\".join(task_additions)\n            return combined\n        \n        return base_template\n    \n    def enhance_user_prompt(self, original_prompt: str, format_type: str = None) -\u003e str:\n        \"\"\"Enhance a user prompt with formatting instructions.\"\"\"\n        if not format_type or format_type not in self.format_templates:\n            return original_prompt\n        \n        format_instructions = self.format_templates[format_type].strip()\n        enhanced_prompt = f\"{original_prompt}\\n\\nPlease format your response as follows:\\n{format_instructions}\"\n        \n        return enhanced_prompt\n    \n    def detect_format_type(self, prompt: str) -\u003e Optional[str]:\n        \"\"\"Detect what format type might be appropriate based on prompt content.\"\"\"\n        prompt_lower = prompt.lower()\n        \n        # Check for format indicators\n        if any(phrase in prompt_lower for phrase in [\"pros and cons\", \"advantages and disadvantages\", \"benefits and drawbacks\"]):\n            return \"pros_cons\"\n        \n        if any(phrase in prompt_lower for phrase in [\"academic\", \"paper\", \"research\", \"literature\", \"theoretical\"]):\n            return \"academic\"\n        \n        if any(phrase in prompt_lower for phrase in [\"tutorial\", \"how to\", \"guide\", \"step by step\", \"walkthrough\"]):\n            return \"tutorial\"\n        \n        if any(phrase in prompt_lower for phrase in [\"explain like\", \"eli5\", \"simple terms\", \"layman's terms\", \"simply explain\"]):\n            return \"eli5\"\n        \n        return None\n"])</script><script>self.__next_f.push([1,"156:[\"$\",\"pre\",\"pre-97\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$318\"}]}]\n157:[\"$\",\"h3\",\"h3-58\",{\"id\":\"2-context-aware-chain-of-thought\",\"children\":\"2. Context-Aware Chain of Thought\"}]\n319:T1da6,"])</script><script>self.__next_f.push([1,"# app/services/chain_of_thought.py\nfrom typing import Dict, List, Any, Optional\nimport logging\nimport json\nimport re\n\nlogger = logging.getLogger(__name__)\n\nclass ChainOfThoughtService:\n    \"\"\"\n    Enhances response accuracy by enabling step-by-step reasoning.\n    \"\"\"\n    \n    def __init__(self):\n        # Configure when to use chain-of-thought prompting\n        self.cot_triggers = [\n            # Keywords indicating complex reasoning is needed\n            r\"(why|how|explain|analyze|reason|think|consider)\",\n            # Question patterns that benefit from step-by-step thinking\n            r\"(what (would|will|could|might) happen if)\",\n            r\"(what (is|are) the (cause|reason|impact|effect|implication))\",\n            # Complexity indicators\n            r\"(complex|complicated|difficult|challenging|nuanced)\",\n            # Multi-step problems\n            r\"(steps|process|procedure|method|approach)\"\n        ]\n        \n        # Task-specific CoT templates\n        self.cot_templates = {\n            \"general\": \"Let's think through this step-by-step.\",\n            \n            \"math\": \"\"\"\n                Let's solve this step-by-step:\n                1. First, understand what we're looking for\n                2. Identify the relevant information and equations\n                3. Work through the solution methodically\n                4. Verify the answer makes sense\n            \"\"\",\n            \n            \"reasoning\": \"\"\"\n                Let's approach this systematically:\n                1. Identify the key elements of the problem\n                2. Consider relevant principles and constraints\n                3. Analyze potential approaches\n                4. Evaluate and compare alternatives\n                5. Draw a well-reasoned conclusion\n            \"\"\",\n            \n            \"decision\": \"\"\"\n                Let's analyze this decision carefully:\n                1. Clarify the decision to be made\n                2. Identify the key criteria and constraints\n                3. Consider the available options\n                4. Evaluate each option against the criteria\n                5. Assess potential risks and trade-offs\n                6. Recommend the best course of action with justification\n            \"\"\",\n            \n            \"causal\": \"\"\"\n                Let's analyze the causal relationships:\n                1. Identify the events or phenomena to be explained\n                2. Consider potential causes and mechanisms\n                3. Evaluate the evidence for each causal link\n                4. Consider alternative explanations\n                5. Draw conclusions about the most likely causal relationships\n            \"\"\"\n        }\n        \n        # Internal vs. external CoT modes\n        self.cot_modes = {\n            \"internal\": {\n                \"prefix\": \"Think through this problem step-by-step before providing your final answer.\",\n                \"format\": \"standard\"  # No special formatting needed\n            },\n            \"external\": {\n                \"prefix\": \"Show your step-by-step reasoning process explicitly in your response.\",\n                \"format\": \"markdown\"  # Format as markdown\n            }\n        }\n    \n    def should_use_cot(self, query: str) -\u003e bool:\n        \"\"\"Determine if chain-of-thought prompting should be used for this query.\"\"\"\n        query_lower = query.lower()\n        \n        # Check for CoT triggers\n        for pattern in self.cot_triggers:\n            if re.search(pattern, query_lower):\n                return True\n        \n        # Check for task complexity indicators\n        if len(query.split()) \u003e 30:  # Longer queries often benefit from CoT\n            return True\n            \n        # Check for explicit reasoning requests\n        explicit_requests = [\n            \"step by step\", \"explain your reasoning\", \"think through\", \n            \"show your work\", \"explain how you\", \"walk me through\"\n        ]\n        \n        if any(request in query_lower for request in explicit_requests):\n            return True\n        \n        return False\n    \n    def detect_task_type(self, query: str) -\u003e str:\n        \"\"\"Detect the type of reasoning task from the query.\"\"\"\n        query_lower = query.lower()\n        \n        # Check for mathematical content\n        math_indicators = [\n            \"calculate\", \"compute\", \"solve\", \"equation\", \"formula\",\n            \"find the value\", \"what is the result\", r\"\\d+(\\.\\d+)?\"\n        ]\n        \n        if any(re.search(indicator, query_lower) for indicator in math_indicators):\n            return \"math\"\n        \n        # Check for decision-making queries\n        decision_indicators = [\n            \"should i\", \"which is better\", \"what's the best\", \"recommend\", \n            \"decide between\", \"choose\", \"options\"\n        ]\n        \n        if any(indicator in query_lower for indicator in decision_indicators):\n            return \"decision\"\n        \n        # Check for causal analysis\n        causal_indicators = [\n            \"why did\", \"what caused\", \"reason for\", \"explain why\",\n            \"how does\", \"what leads to\", \"effect of\", \"impact of\"\n        ]\n        \n        if any(indicator in query_lower for indicator in causal_indicators):\n            return \"causal\"\n        \n        # Default to general reasoning\n        reasoning_indicators = [\n            \"explain\", \"analyze\", \"evaluate\", \"critique\", \"assess\",\n            \"compare\", \"contrast\", \"discuss\", \"review\"\n        ]\n        \n        if any(indicator in query_lower for indicator in reasoning_indicators):\n            return \"reasoning\"\n        \n        return \"general\"\n    \n    def enhance_prompt_with_cot(self, \n                              query: str, \n                              mode: str = \"internal\",\n                              explicit_template: bool = False) -\u003e str:\n        \"\"\"\n        Enhance a prompt with chain-of-thought instructions.\n        \n        Args:\n            query: The original user query\n            mode: \"internal\" (for model thinking) or \"external\" (for visible reasoning)\n            explicit_template: Whether to include the full template or just the instruction\n        \"\"\"\n        if not self.should_use_cot(query):\n            return query\n        \n        # Get CoT mode configuration\n        cot_mode = self.cot_modes.get(mode, self.cot_modes[\"internal\"])\n        \n        # Detect the task type\n        task_type = self.detect_task_type(query)\n        \n        # Get the appropriate template\n        template = self.cot_templates.get(task_type, self.cot_templates[\"general\"])\n        \n        if explicit_template:\n            # Add the full template\n            enhanced = f\"{query}\\n\\n{cot_mode['prefix']}\\n\\n{template.strip()}\"\n        else:\n            # Just add the basic instruction\n            enhanced = f\"{query}\\n\\n{cot_mode['prefix']}\"\n        \n        return enhanced\n    \n    def format_cot_for_response(self, reasoning: str, final_answer: str, mode: str = \"external\") -\u003e str:\n        \"\"\"\n        Format chain-of-thought reasoning and final answer for response.\n        \n        Args:\n            reasoning: The step-by-step reasoning process\n            final_answer: The final answer or conclusion\n            mode: \"internal\" (hidden) or \"external\" (visible)\n        \"\"\"\n        if mode == \"internal\":\n            # For internal mode, just return the final answer\n            return final_answer\n        \n        # For external mode, format the reasoning and answer\n        formatted = f\"\"\"\n## Reasoning Process\n\n{reasoning}\n\n## Conclusion\n\n{final_answer}\n\"\"\"\n        return formatted.strip()\n"])</script><script>self.__next_f.push([1,"158:[\"$\",\"pre\",\"pre-98\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$319\"}]}]\n159:[\"$\",\"h3\",\"h3-59\",{\"id\":\"3-self-verification-and-error-correction\",\"children\":\"3. Self-Verification and Error Correction\"}]\n31a:T2db5,"])</script><script>self.__next_f.push([1,"# app/services/verification_service.py\nimport logging\nfrom typing import Dict, List, Any, Optional, Tuple\nimport re\nimport json\n\nlogger = logging.getLogger(__name__)\n\nclass VerificationService:\n    \"\"\"\n    Improves response accuracy through self-verification and error correction.\n    \"\"\"\n    \n    def __init__(self):\n        # Define verification categories\n        self.verification_categories = [\n            \"factual_accuracy\",\n            \"logical_consistency\",\n            \"completeness\",\n            \"code_correctness\",\n            \"calculation_accuracy\",\n            \"bias_detection\"\n        ]\n        \n        # High-risk categories that should always be verified\n        self.high_risk_categories = [\n            \"medical\",\n            \"legal\",\n            \"financial\",\n            \"security\"\n        ]\n        \n        # Verification prompt templates\n        self.verification_templates = {\n            \"general\": \"\"\"\n                Please verify your response for:\n                1. Factual accuracy - Are all stated facts correct?\n                2. Logical consistency - Is the reasoning sound and free of contradictions?\n                3. Completeness - Does the answer address all aspects of the question?\n                4. Clarity - Is the response clear and easy to understand?\n                \n                If you find any errors or omissions, please correct them in your response.\n            \"\"\",\n            \n            \"factual\": \"\"\"\n                Critically verify the factual claims in your response:\n                - Are dates, names, and definitions accurate?\n                - Are statistics and measurements correct?\n                - Are attributions to people, organizations, or sources accurate?\n                - Have you distinguished between facts and opinions/interpretations?\n                \n                If you identify any factual errors, please correct them.\n            \"\"\",\n            \n            \"code\": \"\"\"\n                Verify your code for:\n                1. Syntax errors and typos\n                2. Logical correctness - does it perform the intended function?\n                3. Edge cases and error handling\n                4. Efficiency and best practices\n                5. Security vulnerabilities\n                \n                If you find any issues, please provide corrected code.\n            \"\"\",\n            \n            \"math\": \"\"\"\n                Verify your mathematical work by:\n                1. Re-checking each calculation step\n                2. Verifying that formulas are applied correctly\n                3. Confirming unit conversions if applicable\n                4. Testing the solution with sample values if possible\n                5. Checking for arithmetic errors\n                \n                If you find any errors, please recalculate and provide the correct answer.\n            \"\"\",\n            \n            \"bias\": \"\"\"\n                Check your response for potential biases:\n                1. Is the framing balanced and objective?\n                2. Have you considered diverse perspectives?\n                3. Are there cultural, geographic, or demographic assumptions?\n                4. Does the language contain implicit value judgments?\n                \n                If you detect bias, please revise for greater objectivity.\n            \"\"\"\n        }\n    \n    def detect_verification_needs(self, query: str) -\u003e List[str]:\n        \"\"\"Detect which verification categories are needed based on the query.\"\"\"\n        query_lower = query.lower()\n        needed_categories = []\n        \n        # Check for high-risk topics\n        high_risk_detected = False\n        for category in self.high_risk_categories:\n            if category in query_lower or f\"related to {category}\" in query_lower:\n                high_risk_detected = True\n                break\n        \n        # For high-risk topics, perform comprehensive verification\n        if high_risk_detected:\n            return [\"factual_accuracy\", \"logical_consistency\", \"completeness\", \"bias_detection\"]\n        \n        # Check for code-related content\n        code_indicators = [\"code\", \"function\", \"program\", \"algorithm\", \"syntax\"]\n        if any(indicator in query_lower for indicator in code_indicators):\n            needed_categories.append(\"code_correctness\")\n        \n        # Check for mathematical content\n        math_indicators = [\"calculate\", \"compute\", \"solve\", \"equation\", \"math problem\"]\n        if any(indicator in query_lower for indicator in math_indicators):\n            needed_categories.append(\"calculation_accuracy\")\n        \n        # Check for factual questions\n        factual_indicators = [\"fact\", \"information about\", \"when did\", \"who is\", \"history of\"]\n        if any(indicator in query_lower for indicator in factual_indicators):\n            needed_categories.append(\"factual_accuracy\")\n        \n        # Check for logical reasoning requirements\n        logic_indicators = [\"why\", \"explain\", \"reason\", \"because\", \"therefore\", \"hence\"]\n        if any(indicator in query_lower for indicator in logic_indicators):\n            needed_categories.append(\"logical_consistency\")\n        \n        # For comprehensive questions\n        if len(query.split()) \u003e 30 or \"comprehensive\" in query_lower or \"detailed\" in query_lower:\n            needed_categories.append(\"completeness\")\n        \n        # For sensitive or controversial topics\n        sensitive_indicators = [\"controversy\", \"debate\", \"opinion\", \"perspective\", \"ethical\"]\n        if any(indicator in query_lower for indicator in sensitive_indicators):\n            needed_categories.append(\"bias_detection\")\n        \n        # Default to basic verification if nothing specific detected\n        if not needed_categories:\n            needed_categories = [\"factual_accuracy\", \"logical_consistency\"]\n        \n        return needed_categories\n    \n    def get_verification_prompt(self, categories: List[str]) -\u003e str:\n        \"\"\"Get the appropriate verification prompt based on needed categories.\"\"\"\n        if \"code_correctness\" in categories and len(categories) == 1:\n            return self.verification_templates[\"code\"]\n            \n        if \"calculation_accuracy\" in categories and len(categories) == 1:\n            return self.verification_templates[\"math\"]\n            \n        if \"factual_accuracy\" in categories and \"bias_detection\" not in categories:\n            return self.verification_templates[\"factual\"]\n            \n        if \"bias_detection\" in categories and len(categories) == 1:\n            return self.verification_templates[\"bias\"]\n            \n        # Default to general verification\n        return self.verification_templates[\"general\"]\n    \n    async def verify_response(self, \n                            query: str, \n                            initial_response: str,\n                            provider_service: Any) -\u003e Tuple[str, bool]:\n        \"\"\"\n        Verify and potentially correct a response.\n        \n        Returns:\n            Tuple of (verified_response, was_corrected)\n        \"\"\"\n        # Detect verification needs\n        verification_categories = self.detect_verification_needs(query)\n        \n        # If no verification needed, return original\n        if not verification_categories:\n            return initial_response, False\n            \n        # Get verification prompt\n        verification_prompt = self.get_verification_prompt(verification_categories)\n        \n        # Create verification messages\n        verification_messages = [\n            {\"role\": \"system\", \"content\": \n                \"You are a verification assistant. Your job is to verify the accuracy, \"\n                \"consistency, and completeness of responses. Identify any errors or \"\n                \"issues, and provide corrections when necessary.\"\n            },\n            {\"role\": \"user\", \"content\": query},\n            {\"role\": \"assistant\", \"content\": initial_response},\n            {\"role\": \"user\", \"content\": verification_prompt}\n        ]\n        \n        try:\n            verification_response = await provider_service.generate_completion(\n                messages=verification_messages,\n                provider=\"openai\",  # Use OpenAI for verification\n                model=\"gpt-4\"  # Use a more capable model for verification\n            )\n            \n            if verification_response and verification_response.get(\"message\", {}).get(\"content\"):\n                # Check if verification found issues\n                verification_text = verification_response[\"message\"][\"content\"]\n                \n                # Look for indicators of corrections\n                correction_indicators = [\n                    \"correction\", \"error\", \"mistake\", \"incorrect\", \n                    \"needs clarification\", \"inaccurate\", \"not quite right\"\n                ]\n                \n                if any(indicator in verification_text.lower() for indicator in correction_indicators):\n                    # Attempt to correct the response\n                    corrected_response = await self._generate_corrected_response(\n                        query, initial_response, verification_text, provider_service\n                    )\n                    return corrected_response, True\n                \n                # If verification found no issues, or was just minor clarifications\n                minor_indicators = [\"minor clarification\", \"additional note\", \"small correction\"]\n                if any(indicator in verification_text.lower() for indicator in minor_indicators):\n                    # Include the clarification in the response\n                    combined = f\"{initial_response}\\n\\n**Note:** {verification_text}\"\n                    return combined, True\n            \n            # If verification failed or found no issues\n            return initial_response, False\n                \n        except Exception as e:\n            logger.error(f\"Error in response verification: {str(e)}\")\n            return initial_response, False\n    \n    async def _generate_corrected_response(self,\n                                        query: str,\n                                        initial_response: str,\n                                        verification_text: str,\n                                        provider_service: Any) -\u003e str:\n        \"\"\"Generate a corrected response based on verification feedback.\"\"\"\n        correction_prompt = [\n            {\"role\": \"system\", \"content\": \n                \"You are a correction assistant. Your job is to provide a revised response \"\n                \"that addresses the issues identified in the verification feedback. \"\n                \"Create a complete, standalone corrected response.\"\n            },\n            {\"role\": \"user\", \"content\": f\"Original question:\\n{query}\"},\n            {\"role\": \"assistant\", \"content\": f\"Initial response:\\n{initial_response}\"},\n            {\"role\": \"user\", \"content\": f\"Verification feedback:\\n{verification_text}\\n\\nPlease provide a corrected response.\"}\n        ]\n        \n        try:\n            correction_response = await provider_service.generate_completion(\n                messages=correction_prompt,\n                provider=\"openai\",\n                model=\"gpt-4\"\n            )\n            \n            if correction_response and correction_response.get(\"message\", {}).get(\"content\"):\n                return correction_response[\"message\"][\"content\"]\n                \n        except Exception as e:\n            logger.error(f\"Error generating corrected response: {str(e)}\")\n        \n        # Fallback - append verification notes to original\n        return f\"{initial_response}\\n\\n**Correction Note:** {verification_text}\"\n"])</script><script>self.__next_f.push([1,"15a:[\"$\",\"pre\",\"pre-99\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$31a\"}]}]\n15b:[\"$\",\"h3\",\"h3-60\",{\"id\":\"4-domain-specific-knowledge-integration\",\"children\":\"4. Domain-Specific Knowledge Integration\"}]\n31b:T1dc3,"])</script><script>self.__next_f.push([1,"# app/services/domain_knowledge.py\nimport logging\nfrom typing import Dict, List, Any, Optional\nimport json\nimport re\nimport os\nimport yaml\n\nlogger = logging.getLogger(__name__)\n\nclass DomainKnowledgeService:\n    \"\"\"\n    Enhances response accuracy by integrating domain-specific knowledge.\n    \"\"\"\n    \n    def __init__(self, knowledge_dir: str = \"knowledge\"):\n        self.knowledge_dir = knowledge_dir\n        \n        # Domain definitions\n        self.domains = {\n            \"programming\": {\n                \"keywords\": [\"coding\", \"programming\", \"software\", \"development\", \"algorithm\", \"function\"],\n                \"languages\": [\"python\", \"javascript\", \"java\", \"c++\", \"ruby\", \"go\", \"rust\", \"php\"]\n            },\n            \"medicine\": {\n                \"keywords\": [\"medical\", \"health\", \"disease\", \"treatment\", \"diagnosis\", \"symptom\", \"patient\"],\n                \"specialties\": [\"cardiology\", \"neurology\", \"pediatrics\", \"oncology\", \"psychiatry\"]\n            },\n            \"finance\": {\n                \"keywords\": [\"finance\", \"investment\", \"stock\", \"market\", \"trading\", \"portfolio\", \"asset\"],\n                \"topics\": [\"stocks\", \"bonds\", \"cryptocurrency\", \"retirement\", \"taxes\", \"budgeting\"]\n            },\n            \"law\": {\n                \"keywords\": [\"legal\", \"law\", \"regulation\", \"compliance\", \"contract\", \"liability\"],\n                \"areas\": [\"corporate\", \"criminal\", \"civil\", \"constitutional\", \"intellectual property\"]\n            },\n            \"science\": {\n                \"keywords\": [\"science\", \"research\", \"experiment\", \"theory\", \"hypothesis\", \"evidence\"],\n                \"fields\": [\"physics\", \"chemistry\", \"biology\", \"astronomy\", \"geology\", \"ecology\"]\n            }\n        }\n        \n        # Load domain knowledge\n        self.domain_knowledge = self._load_domain_knowledge()\n        \n        # Track query-\u003edomain mappings to optimize repeated queries\n        self.domain_cache = {}\n    \n    def _load_domain_knowledge(self) -\u003e Dict[str, Any]:\n        \"\"\"Load domain knowledge from files.\"\"\"\n        knowledge = {}\n        \n        try:\n            # Create knowledge dir if it doesn't exist\n            os.makedirs(self.knowledge_dir, exist_ok=True)\n            \n            # List all domain knowledge files\n            for domain in self.domains.keys():\n                domain_path = os.path.join(self.knowledge_dir, f\"{domain}.yaml\")\n                \n                # Create empty file if it doesn't exist\n                if not os.path.exists(domain_path):\n                    with open(domain_path, 'w') as f:\n                        yaml.dump({\n                            \"domain\": domain,\n                            \"concepts\": {},\n                            \"facts\": [],\n                            \"common_misconceptions\": [],\n                            \"best_practices\": []\n                        }, f)\n                \n                # Load domain knowledge\n                try:\n                    with open(domain_path, 'r') as f:\n                        domain_data = yaml.safe_load(f)\n                        knowledge[domain] = domain_data\n                except Exception as e:\n                    logger.error(f\"Error loading domain knowledge for {domain}: {str(e)}\")\n                    knowledge[domain] = {\n                        \"domain\": domain,\n                        \"concepts\": {},\n                        \"facts\": [],\n                        \"common_misconceptions\": [],\n                        \"best_practices\": []\n                    }\n        except Exception as e:\n            logger.error(f\"Error loading domain knowledge: {str(e)}\")\n        \n        return knowledge\n    \n    def detect_domains(self, query: str) -\u003e List[str]:\n        \"\"\"Detect relevant domains for a query.\"\"\"\n        # Check cache first\n        cache_key = hashlib.md5(query.encode()).hexdigest()\n        if cache_key in self.domain_cache:\n            return self.domain_cache[cache_key]\n        \n        query_lower = query.lower()\n        relevant_domains = []\n        \n        # Check each domain for relevance\n        for domain, definition in self.domains.items():\n            # Check domain keywords\n            keyword_match = any(keyword in query_lower for keyword in definition[\"keywords\"])\n            \n            # Check specific domain topics\n            topic_match = False\n            for topic_category, topics in definition.items():\n                if topic_category != \"keywords\":\n                    if any(topic in query_lower for topic in topics):\n                        topic_match = True\n                        break\n            \n            if keyword_match or topic_match:\n                relevant_domains.append(domain)\n        \n        # Cache result\n        self.domain_cache[cache_key] = relevant_domains\n        return relevant_domains\n    \n    def get_domain_knowledge(self, domains: List[str]) -\u003e Dict[str, Any]:\n        \"\"\"Get knowledge for the specified domains.\"\"\"\n        combined_knowledge = {\n            \"concepts\": {},\n            \"facts\": [],\n            \"common_misconceptions\": [],\n            \"best_practices\": []\n        }\n        \n        for domain in domains:\n            if domain in self.domain_knowledge:\n                domain_data = self.domain_knowledge[domain]\n                \n                # Merge concepts (dictionary)\n                combined_knowledge[\"concepts\"].update(domain_data.get(\"concepts\", {}))\n                \n                # Extend lists\n                for key in [\"facts\", \"common_misconceptions\", \"best_practices\"]:\n                    combined_knowledge[key].extend(domain_data.get(key, []))\n        \n        return combined_knowledge\n    \n    def format_domain_knowledge(self, knowledge: Dict[str, Any]) -\u003e str:\n        \"\"\"Format domain knowledge as a context string.\"\"\"\n        if not knowledge or all(not v for v in knowledge.values()):\n            return \"\"\n        \n        formatted_parts = []\n        \n        # Format concepts\n        if knowledge[\"concepts\"]:\n            concepts_list = []\n            for concept, definition in knowledge[\"concepts\"].items():\n                concepts_list.append(f\"- {concept}: {definition}\")\n            \n            formatted_parts.append(\"Key concepts:\\n\" + \"\\n\".join(concepts_list))\n        \n        # Format facts\n        if knowledge[\"facts\"]:\n            formatted_parts.append(\"Important facts:\\n- \" + \"\\n- \".join(knowledge[\"facts\"]))\n        \n        # Format misconceptions\n        if knowledge[\"common_misconceptions\"]:\n            formatted_parts.append(\"Common misconceptions to avoid:\\n- \" + \"\\n- \".join(knowledge[\"common_misconceptions\"]))\n        \n        # Format best practices\n        if knowledge[\"best_practices\"]:\n            formatted_parts.append(\"Best practices:\\n- \" + \"\\n- \".join(knowledge[\"best_practices\"]))\n        \n        return \"\\n\\n\".join(formatted_parts)\n    \n    def enhance_prompt_with_domain_knowledge(self, query: str, system_prompt: str) -\u003e str:\n        \"\"\"Enhance a system prompt with relevant domain knowledge.\"\"\"\n        # Detect relevant domains\n        domains = self.detect_domains(query)\n        \n        if not domains:\n            return system_prompt\n        \n        # Get domain knowledge\n        knowledge = self.get_domain_knowledge(domains)\n        \n        # Format knowledge as context\n        knowledge_text = self.format_domain_knowledge(knowledge)\n        \n        if not knowledge_text:\n            return system_prompt\n        \n        # Add to system prompt\n        enhanced_prompt = f\"{system_prompt}\\n\\nRelevant domain knowledge:\\n{knowledge_text}\"\n        \n        return enhanced_prompt\n"])</script><script>self.__next_f.push([1,"15c:[\"$\",\"pre\",\"pre-100\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$31b\"}]}]\n15d:[\"$\",\"h3\",\"h3-61\",{\"id\":\"5-dynamic-few-shot-learning\",\"children\":\"5. Dynamic Few-Shot Learning\"}]\n31c:T1ac8,"])</script><script>self.__next_f.push([1,"# app/services/few_shot_examples.py\nimport logging\nfrom typing import Dict, List, Any, Optional, Tuple\nimport os\nimport json\nimport random\nimport re\nimport hashlib\n\nlogger = logging.getLogger(__name__)\n\nclass FewShotExampleService:\n    \"\"\"\n    Enhances response accuracy using dynamic few-shot learning with examples.\n    \"\"\"\n    \n    def __init__(self, examples_dir: str = \"examples\"):\n        self.examples_dir = examples_dir\n        \n        # Ensure examples directory exists\n        os.makedirs(examples_dir, exist_ok=True)\n        \n        # Task categories for examples\n        self.task_categories = {\n            \"code_generation\": {\n                \"keywords\": [\"write code\", \"function\", \"implement\", \"program\", \"algorithm\"],\n                \"patterns\": [r\"write a .* function\", r\"implement .* in (python|javascript|java|c\\+\\+)\"]\n            },\n            \"explanation\": {\n                \"keywords\": [\"explain\", \"describe\", \"how does\", \"what is\", \"why is\"],\n                \"patterns\": [r\"explain .* to me\", r\"what is the .* of\", r\"how does .* work\"]\n            },\n            \"classification\": {\n                \"keywords\": [\"classify\", \"categorize\", \"identify\", \"is this\", \"determine\"],\n                \"patterns\": [r\"is this .* or .*\", r\"which category\", r\"identify the .*\"]\n            },\n            \"comparison\": {\n                \"keywords\": [\"compare\", \"contrast\", \"difference\", \"similarities\", \"versus\"],\n                \"patterns\": [r\"compare .* and .*\", r\"what is the difference between\", r\".* vs .*\"]\n            },\n            \"summarization\": {\n                \"keywords\": [\"summarize\", \"summary\", \"brief overview\", \"key points\"],\n                \"patterns\": [r\"summarize .*\", r\"provide a summary\", r\"key points of\"]\n            }\n        }\n        \n        # Load examples\n        self.examples = self._load_examples()\n    \n    def _load_examples(self) -\u003e Dict[str, List[Dict[str, str]]]:\n        \"\"\"Load examples from files.\"\"\"\n        examples = {category: [] for category in self.task_categories.keys()}\n        \n        # Load examples for each category\n        for category in self.task_categories.keys():\n            category_file = os.path.join(self.examples_dir, f\"{category}.json\")\n            \n            if os.path.exists(category_file):\n                try:\n                    with open(category_file, 'r') as f:\n                        category_examples = json.load(f)\n                        examples[category] = category_examples\n                except Exception as e:\n                    logger.error(f\"Error loading examples for {category}: {str(e)}\")\n        \n        return examples\n    \n    def detect_task_category(self, query: str) -\u003e Optional[str]:\n        \"\"\"Detect the task category for a query.\"\"\"\n        query_lower = query.lower()\n        \n        # Check each category\n        for category, definition in self.task_categories.items():\n            # Check keywords\n            if any(keyword in query_lower for keyword in definition[\"keywords\"]):\n                return category\n            \n            # Check regex patterns\n            if any(re.search(pattern, query_lower) for pattern in definition[\"patterns\"]):\n                return category\n        \n        return None\n    \n    def select_examples(self, \n                      query: str, \n                      category: Optional[str] = None, \n                      num_examples: int = 3) -\u003e List[Dict[str, str]]:\n        \"\"\"Select the most relevant examples for a query.\"\"\"\n        # Detect category if not provided\n        if not category:\n            category = self.detect_task_category(query)\n            \n        if not category or category not in self.examples or not self.examples[category]:\n            return []\n        \n        category_examples = self.examples[category]\n        \n        # If we have few examples, just return all of them (up to num_examples)\n        if len(category_examples) \u003c= num_examples:\n            return category_examples\n        \n        # For simplicity, we're using random selection here\n        # In a production system, this would use semantic similarity or other relevance metrics\n        selected = random.sample(category_examples, min(num_examples, len(category_examples)))\n        \n        return selected\n    \n    def format_examples_for_prompt(self, examples: List[Dict[str, str]]) -\u003e str:\n        \"\"\"Format examples for inclusion in a prompt.\"\"\"\n        if not examples:\n            return \"\"\n        \n        formatted_examples = []\n        \n        for i, example in enumerate(examples, 1):\n            query = example.get(\"query\", \"\")\n            response = example.get(\"response\", \"\")\n            \n            formatted = f\"Example {i}:\\n\\nUser: {query}\\n\\nAssistant: {response}\\n\"\n            formatted_examples.append(formatted)\n        \n        return \"\\n\".join(formatted_examples)\n    \n    def enhance_prompt_with_examples(self, \n                                   query: str, \n                                   system_prompt: str,\n                                   num_examples: int = 2) -\u003e str:\n        \"\"\"Enhance a system prompt with few-shot examples.\"\"\"\n        # Select relevant examples\n        examples = self.select_examples(query, num_examples=num_examples)\n        \n        if not examples:\n            return system_prompt\n        \n        # Format examples\n        examples_text = self.format_examples_for_prompt(examples)\n        \n        # Add to system prompt\n        enhanced_prompt = f\"{system_prompt}\\n\\nHere are some examples of how to respond to similar queries:\\n\\n{examples_text}\"\n        \n        return enhanced_prompt\n    \n    def add_example(self, category: str, query: str, response: str) -\u003e bool:\n        \"\"\"Add a new example to the examples collection.\"\"\"\n        if category not in self.task_categories:\n            logger.error(f\"Invalid category: {category}\")\n            return False\n        \n        example = {\n            \"query\": query,\n            \"response\": response,\n            \"id\": hashlib.md5(f\"{category}:{query}\".encode()).hexdigest()\n        }\n        \n        # Add to in-memory collection\n        if category not in self.examples:\n            self.examples[category] = []\n        \n        # Check if this example already exists\n        existing_ids = [e.get(\"id\") for e in self.examples[category]]\n        if example[\"id\"] in existing_ids:\n            return False  # Example already exists\n        \n        self.examples[category].append(example)\n        \n        # Save to file\n        try:\n            category_file = os.path.join(self.examples_dir, f\"{category}.json\")\n            with open(category_file, 'w') as f:\n                json.dump(self.examples[category], f, indent=2)\n            return True\n        except Exception as e:\n            logger.error(f\"Error saving example: {str(e)}\")\n            return False\n"])</script><script>self.__next_f.push([1,"15e:[\"$\",\"pre\",\"pre-101\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$31c\"}]}]\n15f:[\"$\",\"h2\",\"h2-62\",{\"id\":\"deployment-strategies\",\"children\":\"Deployment Strategies\"}]\n160:[\"$\",\"h3\",\"h3-62\",{\"id\":\"local-development-environment\",\"children\":\"Local Development Environment\"}]\n161:[\"$\",\"h4\",\"h4-22\",{\"id\":\"setup-script-for-local-deployment\",\"children\":\"Setup Script for Local Deployment\"}]\n31d:Tacd,"])</script><script>self.__next_f.push([1,"#!/bin/bash\n# local_setup.sh - Set up local development environment\n\nset -e  # Exit on error\n\n# Check for required tools\necho \"Checking prerequisites...\"\ncommand -v python3 \u003e/dev/null 2\u003e\u00261 || { echo \"Python 3 is required but not installed. Aborting.\"; exit 1; }\ncommand -v pip3 \u003e/dev/null 2\u003e\u00261 || { echo \"pip3 is required but not installed. Aborting.\"; exit 1; }\ncommand -v docker \u003e/dev/null 2\u003e\u00261 || { echo \"Docker is required but not installed. Aborting.\"; exit 1; }\ncommand -v docker-compose \u003e/dev/null 2\u003e\u00261 || { echo \"Docker Compose is required but not installed. Aborting.\"; exit 1; }\n\n# Create virtual environment\necho \"Creating Python virtual environment...\"\npython3 -m venv venv\nsource venv/bin/activate\n\n# Install dependencies\necho \"Installing Python dependencies...\"\npip install --upgrade pip\npip install -r requirements.txt\npip install -r requirements-dev.txt\n\n# Set up environment file\nif [ ! -f .env ]; then\n    echo \"Creating .env file...\"\n    cp .env.example .env\n    \n    # Prompt for OpenAI API key\n    read -p \"Enter your OpenAI API key (leave blank to skip): \" openai_key\n    if [ ! -z \"$openai_key\" ]; then\n        sed -i \"s/OPENAI_API_KEY=.*/OPENAI_API_KEY=$openai_key/\" .env\n    fi\n    \n    # Set environment to development\n    sed -i \"s/APP_ENV=.*/APP_ENV=development/\" .env\n    \n    echo \".env file created. Please review and update as needed.\"\nelse\n    echo \".env file already exists. Skipping creation.\"\nfi\n\n# Check if Ollama is installed\nif ! command -v ollama \u003e/dev/null 2\u003e\u00261; then\n    echo \"Ollama not found. Would you like to install it? (y/n)\"\n    read install_ollama\n    \n    if [ \"$install_ollama\" = \"y\" ]; then\n        echo \"Installing Ollama...\"\n        if [[ \"$OSTYPE\" == \"darwin\"* ]]; then\n            # macOS\n            curl -fsSL https://ollama.com/install.sh | sh\n        else\n            # Linux\n            curl -fsSL https://ollama.com/install.sh | sh\n        fi\n    else\n        echo \"Skipping Ollama installation. You will need to install it manually.\"\n    fi\nelse\n    echo \"Ollama already installed.\"\nfi\n\n# Pull required Ollama models\nif command -v ollama \u003e/dev/null 2\u003e\u00261; then\n    echo \"Would you like to pull the recommended Ollama models? (y/n)\"\n    read pull_models\n    \n    if [ \"$pull_models\" = \"y\" ]; then\n        echo \"Pulling Ollama models...\"\n        ollama pull llama2\n        ollama pull mistral\n        ollama pull codellama\n    fi\nfi\n\n# Start Redis for development\necho \"Starting Redis with Docker...\"\ndocker-compose up -d redis\n\n# Initialize database\necho \"Initializing database...\"\npython scripts/init_db.py\n\n# Run tests to verify setup\necho \"Running tests to verify setup...\"\npytest tests/unit\n\necho \"Setup complete! You can now start the development server with:\"\necho \"uvicorn app.main:app --reload\"\n"])</script><script>self.__next_f.push([1,"162:[\"$\",\"pre\",\"pre-102\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"$31d\"}]}]\n163:[\"$\",\"h4\",\"h4-23\",{\"id\":\"docker-compose-for-local-services\",\"children\":\"Docker Compose for Local Services\"}]\n31e:T47b,# docker-compose.yml\nversion: '3.8'\n\nservices:\n  app:\n    build:\n      context: .\n      dockerfile: Dockerfile.dev\n    ports:\n      - \"8000:8000\"\n    volumes:\n      - .:/app\n    environment:\n      - PYTHONPATH=/app\n      - REDIS_URL=redis://redis:6379/0\n      - OLLAMA_HOST=http://ollama:11434\n      - APP_ENV=development\n      - FORCE_DEV_MODE=true\n    depends_on:\n      - redis\n      - ollama\n    command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload\n\n  redis:\n    image: redis:alpine\n    ports:\n      - \"6379:6379\"\n    volumes:\n      - redis_data:/data\n\n  ollama:\n    image: ollama/ollama:latest\n    ports:\n      - \"11434:11434\"\n    volumes:\n      - ollama_data:/root/.ollama\n    deploy:\n      resources:\n        reservations:\n          devices:\n            - driver: nvidia\n              count: all\n              capabilities: [gpu]\n\n  ui:\n    build:\n      context: ./ui\n      dockerfile: Dockerfile.dev\n    ports:\n      - \"3000:3000\"\n    volumes:\n      - ./ui:/app\n      - /app/node_modules\n    environment:\n      - API_URL=http://app:8000\n    depends_on:\n      - app\n    command: npm start\n\nvolumes:\n  redis_data:\n  ollama_data:\n164:[\"$\",\"pre\",\"pre-103\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$31e\"}]}]\n165:[\"$\",\"h4\",\"h4-24\",{\"id\":\"development-dockerfile\",\"children\":\"Development Dockerfile\"}]\n166:[\"$\",\"pre\",\"pre-104\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-dockerfile\",\"children\":\"# Dockerfile.dev\\nFROM python:3.11-slim\\n\\nWORKDIR /app\\n\\n# Install system dependencies\\nRUN apt-get update \u0026\u0026 apt-get install -y --no-install-recommends \\\\\\n    curl \\\\\\n    gcc \\\\\\n    build-essential \\\\\\n    \u0026\u0026 rm -rf /var/lib/apt/lists/*\\n\\n# Install Python dependencies\\nCOPY requirements.txt requirements-dev.txt ./\\nRUN pip install --no-cache-dir -r requirements.txt -r requirements-dev.txt\\n\\n# Copy application code\\nCOPY . .\\n\\n# Set development environment\\nENV PYTHONUNBUFFERED=1\\nENV PYTHONDONTWRITEBYTECODE=1\\nENV APP_ENV=development\\n\\n# Make scripts executable\\nRUN chmod +x scripts/*.sh\\n\\n# Default command\\nCMD [\\\"uvicorn\\\", \\\"app.main:app\\\", \\\"--host\\\", \\\"0.0.0.0\\\", \\\"--port\\\", \\\"8000\\\", \\\"--reload\\\"]\\n\"}]}]\n167:[\"$\",\"h4\",\"h4-25\",{\"id\":\"configuration-for-local-environment\",\"children\":\"Configuration for Local Environment\"}]\n31f:T48d,# app/config/local.py\n\"\"\"Configuration for local development environment.\"\"\"\n\nimport os\nfrom typing import Dict, Any, List\n\n# API configuration\nAPI_HOST = \"0.0.0.0\"\nAPI_PORT = 8000\nAPI_RELOAD = True\nAPI_DEBUG = True\n\n# OpenAI configuration\nOPENAI_API_KEY = os.environ.get(\"OPENAI_API_KEY\", \"\")\nOPENAI_ORG_ID = os.environ.get(\"OPENAI_ORG_ID\", \"\")\nOPENAI_MODEL = \"gpt-3.5-turbo\"  # Default to cheaper model for development\n\n# Ollama configuration\nOLLAMA_HOST = os.environ.get(\"OLLAMA_HOST\", \"http://localhost:11434\")\nOLLAMA_MODEL = \"llama2\"  # Default local model\nENABLE_GPU = True\n\n# App configuration\nLOG_LEVEL = \"DEBUG\"\nENABLE_CORS = True\nCORS_ORIGINS = [\"http://localhost:3000\", \"http://127.0.0.1:3000\"]\n\n# Feature flags\nENABLE_CACHING = True\nENABLE_RATE_LIMITING = False  # Disable rate limiting in local development\nENABLE_PARALLEL_PROCESSING = True\nENABLE_RESPONSE_VERIFICATION = True\n\n# Development-specific settings\nFORCE_DEV_MODE = os.environ.get(\"FORCE_DEV_MODE\", \"false\").lower() == \"true\"\nDEV_OPENAI_QUOTA = 100  # Maximum OpenAI API calls per day in development\n\n# Redis configuration\nREDIS_URL = os.environ.get(\"REDIS_URL\", \"redis://localhost:6379/0\")\n168:[\"$\",\"pre\",\"pre-105\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":"])</script><script>self.__next_f.push([1,"\"language-python\",\"children\":\"$31f\"}]}]\n169:[\"$\",\"h3\",\"h3-63\",{\"id\":\"production-deployment\",\"children\":\"Production Deployment\"}]\n16a:[\"$\",\"h4\",\"h4-26\",{\"id\":\"kubernetes-manifests-for-production\",\"children\":\"Kubernetes Manifests for Production\"}]\n320:Tcdc,"])</script><script>self.__next_f.push([1,"# kubernetes/deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: mcp-api\n  labels:\n    app: mcp-api\nspec:\n  replicas: 3\n  selector:\n    matchLabels:\n      app: mcp-api\n  template:\n    metadata:\n      labels:\n        app: mcp-api\n    spec:\n      containers:\n      - name: api\n        image: ${DOCKER_REGISTRY}/mcp-api:${IMAGE_TAG}\n        imagePullPolicy: Always\n        ports:\n        - containerPort: 8000\n        env:\n        - name: APP_ENV\n          value: \"production\"\n        - name: REDIS_URL\n          valueFrom:\n            secretKeyRef:\n              name: mcp-secrets\n              key: redis_url\n        - name: OPENAI_API_KEY\n          valueFrom:\n            secretKeyRef:\n              name: mcp-secrets\n              key: openai_api_key\n        - name: OLLAMA_HOST\n          value: \"http://ollama-service:11434\"\n        - name: MONTHLY_BUDGET\n          value: \"${MONTHLY_BUDGET}\"\n        resources:\n          requests:\n            cpu: 500m\n            memory: 512Mi\n          limits:\n            cpu: 1000m\n            memory: 1Gi\n        readinessProbe:\n          httpGet:\n            path: /api/health\n            port: 8000\n          initialDelaySeconds: 10\n          periodSeconds: 5\n        livenessProbe:\n          httpGet:\n            path: /api/health\n            port: 8000\n          initialDelaySeconds: 20\n          periodSeconds: 15\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: ollama\n  labels:\n    app: ollama\nspec:\n  replicas: 1  # Start with a single replica for Ollama\n  selector:\n    matchLabels:\n      app: ollama\n  template:\n    metadata:\n      labels:\n        app: ollama\n    spec:\n      containers:\n      - name: ollama\n        image: ollama/ollama:latest\n        ports:\n        - containerPort: 11434\n        volumeMounts:\n        - mountPath: /root/.ollama\n          name: ollama-data\n        resources:\n          requests:\n            cpu: 1000m\n            memory: 4Gi\n          limits:\n            cpu: 4000m\n            memory: 16Gi\n        # If using GPU\n        env:\n        - name: NVIDIA_VISIBLE_DEVICES\n          value: \"all\"\n        - name: NVIDIA_DRIVER_CAPABILITIES\n          value: \"compute,utility\"\n      volumes:\n      - name: ollama-data\n        persistentVolumeClaim:\n          claimName: ollama-pvc\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: mcp-api-service\nspec:\n  selector:\n    app: mcp-api\n  ports:\n  - port: 80\n    targetPort: 8000\n  type: ClusterIP\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: ollama-service\nspec:\n  selector:\n    app: ollama\n  ports:\n  - port: 11434\n    targetPort: 11434\n  type: ClusterIP\n---\napiVersion: networking.k8s.io/v1\nkind: Ingress\nmetadata:\n  name: mcp-ingress\n  annotations:\n    kubernetes.io/ingress.class: \"nginx\"\n    cert-manager.io/cluster-issuer: \"letsencrypt-prod\"\nspec:\n  tls:\n  - hosts:\n    - api.mcpservice.com\n    secretName: mcp-tls\n  rules:\n  - host: api.mcpservice.com\n    http:\n      paths:\n      - path: /\n        pathType: Prefix\n        backend:\n          service:\n            name: mcp-api-service\n            port:\n              number: 80\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: ollama-pvc\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 50Gi  # Adjust based on your models\n"])</script><script>self.__next_f.push([1,"16b:[\"$\",\"pre\",\"pre-106\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$320\"}]}]\n16c:[\"$\",\"h4\",\"h4-27\",{\"id\":\"horizontal-pod-autoscaling-hpa\",\"children\":\"Horizontal Pod Autoscaling (HPA)\"}]\n16d:[\"$\",\"pre\",\"pre-107\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"# kubernetes/hpa.yaml\\napiVersion: autoscaling/v2\\nkind: HorizontalPodAutoscaler\\nmetadata:\\n  name: mcp-api-hpa\\nspec:\\n  scaleTargetRef:\\n    apiVersion: apps/v1\\n    kind: Deployment\\n    name: mcp-api\\n  minReplicas: 3\\n  maxReplicas: 10\\n  metrics:\\n  - type: Resource\\n    resource:\\n      name: cpu\\n      target:\\n        type: Utilization\\n        averageUtilization: 70\\n  - type: Resource\\n    resource:\\n      name: memory\\n      target:\\n        type: Utilization\\n        averageUtilization: 80\\n\"}]}]\n16e:[\"$\",\"h4\",\"h4-28\",{\"id\":\"deployment-script\",\"children\":\"Deployment Script\"}]\n321:T850,"])</script><script>self.__next_f.push([1,"#!/bin/bash\n# deploy.sh - Production deployment script\n\nset -e  # Exit on error\n\n# Check required environment variables\nif [ -z \"$DOCKER_REGISTRY\" ] || [ -z \"$IMAGE_TAG\" ] || [ -z \"$K8S_NAMESPACE\" ]; then\n    echo \"Error: Required environment variables not set.\"\n    echo \"Please set DOCKER_REGISTRY, IMAGE_TAG, and K8S_NAMESPACE.\"\n    exit 1\nfi\n\n# Build and push Docker image\necho \"Building and pushing Docker image...\"\ndocker build -t ${DOCKER_REGISTRY}/mcp-api:${IMAGE_TAG} -f Dockerfile.prod .\ndocker push ${DOCKER_REGISTRY}/mcp-api:${IMAGE_TAG}\n\n# Apply Kubernetes configuration\necho \"Applying Kubernetes configuration...\"\n\n# Create namespace if it doesn't exist\nkubectl get namespace ${K8S_NAMESPACE} || kubectl create namespace ${K8S_NAMESPACE}\n\n# Apply secrets\necho \"Applying secrets...\"\nkubectl apply -f kubernetes/secrets.yaml -n ${K8S_NAMESPACE}\n\n# Deploy Redis if needed\necho \"Deploying Redis...\"\nhelm upgrade --install redis bitnami/redis \\\n  --namespace ${K8S_NAMESPACE} \\\n  --set auth.password=${REDIS_PASSWORD} \\\n  --set master.persistence.size=8Gi\n\n# Deploy application\necho \"Deploying application...\"\n# Replace variables in deployment file\nenvsubst \u003c kubernetes/deployment.yaml | kubectl apply -f - -n ${K8S_NAMESPACE}\n\n# Apply HPA\nkubectl apply -f kubernetes/hpa.yaml -n ${K8S_NAMESPACE}\n\n# Verify deployment\necho \"Verifying deployment...\"\nkubectl rollout status deployment/mcp-api -n ${K8S_NAMESPACE}\nkubectl rollout status deployment/ollama -n ${K8S_NAMESPACE}\n\n# Initialize Ollama models if needed\necho \"Would you like to initialize Ollama models? (y/n)\"\nread init_models\n\nif [ \"$init_models\" = \"y\" ]; then\n    echo \"Initializing Ollama models...\"\n    # Get pod name\n    OLLAMA_POD=$(kubectl get pods -l app=ollama -n ${K8S_NAMESPACE} -o jsonpath=\"{.items[0].metadata.name}\")\n    \n    # Pull models\n    kubectl exec ${OLLAMA_POD} -n ${K8S_NAMESPACE} -- ollama pull llama2\n    kubectl exec ${OLLAMA_POD} -n ${K8S_NAMESPACE} -- ollama pull mistral\n    kubectl exec ${OLLAMA_POD} -n ${K8S_NAMESPACE} -- ollama pull codellama\nfi\n\necho \"Deployment complete!\"\necho \"API available at: https://api.mcpservice.com\"\n"])</script><script>self.__next_f.push([1,"16f:[\"$\",\"pre\",\"pre-108\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"$321\"}]}]\n170:[\"$\",\"h4\",\"h4-29\",{\"id\":\"production-dockerfile\",\"children\":\"Production Dockerfile\"}]\n171:[\"$\",\"pre\",\"pre-109\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-dockerfile\",\"children\":\"# Dockerfile.prod\\nFROM python:3.11-slim as builder\\n\\nWORKDIR /app\\n\\n# Install build dependencies\\nRUN apt-get update \u0026\u0026 apt-get install -y --no-install-recommends \\\\\\n    gcc \\\\\\n    build-essential \\\\\\n    \u0026\u0026 rm -rf /var/lib/apt/lists/*\\n\\n# Install Python dependencies\\nCOPY requirements.txt ./\\nRUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt\\n\\n# Final stage\\nFROM python:3.11-slim\\n\\nWORKDIR /app\\n\\n# Copy wheels from builder stage\\nCOPY --from=builder /app/wheels /wheels\\nRUN pip install --no-cache /wheels/*\\n\\n# Copy application code\\nCOPY app /app/app\\nCOPY scripts /app/scripts\\nCOPY alembic.ini /app/\\n\\n# Create non-root user\\nRUN useradd -m appuser \u0026\u0026 \\\\\\n    chown -R appuser:appuser /app\\nUSER appuser\\n\\n# Set production environment\\nENV PYTHONPATH=/app\\nENV APP_ENV=production\\nENV PYTHONUNBUFFERED=1\\n\\n# Expose port\\nEXPOSE 8000\\n\\n# Run using Gunicorn in production\\nCMD [\\\"gunicorn\\\", \\\"-k\\\", \\\"uvicorn.workers.UvicornWorker\\\", \\\"-c\\\", \\\"app/config/gunicorn.py\\\", \\\"app.main:app\\\"]\\n\"}]}]\n172:[\"$\",\"h4\",\"h4-30\",{\"id\":\"gunicorn-configuration-for-production\",\"children\":\"Gunicorn Configuration for Production\"}]\n173:[\"$\",\"pre\",\"pre-110\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"# app/config/gunicorn.py\\n\\\"\\\"\\\"Gunicorn configuration for production deployment.\\\"\\\"\\\"\\n\\nimport multiprocessing\\nimport os\\n\\n# Bind to 0.0.0.0:8000\\nbind = \\\"0.0.0.0:8000\\\"\\n\\n# Worker configuration\\nworkers = multiprocessing.cpu_count() * 2 + 1\\nworker_class = \\\"uvicorn.workers.UvicornWorker\\\"\\nworker_connections = 1000\\ntimeout = 60\\nkeepalive = 5\\n\\n# Logging\\naccesslog = \\\"-\\\"\\nerrorlog = \\\"-\\\"\\nloglevel = os.environ.get(\\\"LOG_LEVEL\\\", \\\"info\\\").lower()\\n\\n# Security\\nlimit_request_line = 4094\\nlimit_request_fields = 100\\nlimit_request_field_size = 8190\\n\\n# Process naming\\nproc_name = \\\"mcp-api\\\"\\n\"}]}]\n174:[\"$\",\"h3\",\"h3-64\",{\"id\":\"cloud-deployment-aws\",\"children\":\"Cloud Deployment (AWS)\"}]\n175:[\"$\",\"h4\",\"h4-31\",{\"id\":\"aws-cloudformation-template\",\"children\":\"AWS CloudFormation Template\"}]\n322:T2883,"])</script><script>self.__next_f.push([1,"# aws/cloudformation.yaml\nAWSTemplateFormatVersion: '2010-09-09'\nDescription: 'MCP OpenAI-Ollama Hybrid System'\n\nParameters:\n  Environment:\n    Description: Deployment environment\n    Type: String\n    Default: Production\n    AllowedValues:\n      - Development\n      - Staging\n      - Production\n    \n  ECRRepositoryName:\n    Description: ECR Repository name\n    Type: String\n    Default: mcp-api\n  \n  VpcId:\n    Description: VPC ID\n    Type: AWS::EC2::VPC::Id\n  \n  SubnetIds:\n    Description: Subnet IDs for the ECS tasks\n    Type: List\u003cAWS::EC2::Subnet::Id\u003e\n  \n  OllamaInstanceType:\n    Description: EC2 instance type for Ollama\n    Type: String\n    Default: g4dn.xlarge\n    AllowedValues:\n      - g4dn.xlarge\n      - g5.xlarge\n      - p3.2xlarge\n      - c5.2xlarge  # CPU-only option\n  \n  ApiInstanceCount:\n    Description: Number of API instances\n    Type: Number\n    Default: 2\n    MinValue: 1\n    MaxValue: 10\n\nResources:\n  # ECR Repository\n  ECRRepository:\n    Type: AWS::ECR::Repository\n    Properties:\n      RepositoryName: !Ref ECRRepositoryName\n      ImageScanningConfiguration:\n        ScanOnPush: true\n      LifecyclePolicy:\n        LifecyclePolicyText: |\n          {\n            \"rules\": [\n              {\n                \"rulePriority\": 1,\n                \"description\": \"Keep only the 10 most recent images\",\n                \"selection\": {\n                  \"tagStatus\": \"any\",\n                  \"countType\": \"imageCountMoreThan\",\n                  \"countNumber\": 10\n                },\n                \"action\": {\n                  \"type\": \"expire\"\n                }\n              }\n            ]\n          }\n\n  # ElastiCache Redis\n  RedisSecurityGroup:\n    Type: AWS::EC2::SecurityGroup\n    Properties:\n      GroupDescription: Security group for Redis cluster\n      VpcId: !Ref VpcId\n      SecurityGroupIngress:\n        - IpProtocol: tcp\n          FromPort: 6379\n          ToPort: 6379\n          SourceSecurityGroupId: !GetAtt APISecurityGroup.GroupId\n\n  RedisSubnetGroup:\n    Type: AWS::ElastiCache::SubnetGroup\n    Properties:\n      Description: Subnet group for Redis\n      SubnetIds: !Ref SubnetIds\n\n  RedisCluster:\n    Type: AWS::ElastiCache::CacheCluster\n    Properties:\n      Engine: redis\n      CacheNodeType: cache.t3.medium\n      NumCacheNodes: 1\n      VpcSecurityGroupIds:\n        - !GetAtt RedisSecurityGroup.GroupId\n      CacheSubnetGroupName: !Ref RedisSubnetGroup\n      AutoMinorVersionUpgrade: true\n\n  # Ollama EC2 Instance\n  OllamaSecurityGroup:\n    Type: AWS::EC2::SecurityGroup\n    Properties:\n      GroupDescription: Security group for Ollama EC2 instance\n      VpcId: !Ref VpcId\n      SecurityGroupIngress:\n        - IpProtocol: tcp\n          FromPort: 11434\n          ToPort: 11434\n          SourceSecurityGroupId: !GetAtt APISecurityGroup.GroupId\n        - IpProtocol: tcp\n          FromPort: 22\n          ToPort: 22\n          CidrIp: 0.0.0.0/0  # Restrict this in production\n\n  OllamaInstanceRole:\n    Type: AWS::IAM::Role\n    Properties:\n      AssumeRolePolicyDocument:\n        Version: '2012-10-17'\n        Statement:\n          - Effect: Allow\n            Principal:\n              Service: ec2.amazonaws.com\n            Action: sts:AssumeRole\n      ManagedPolicyArns:\n        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore\n\n  OllamaInstanceProfile:\n    Type: AWS::IAM::InstanceProfile\n    Properties:\n      Roles:\n        - !Ref OllamaInstanceRole\n\n  OllamaEBSVolume:\n    Type: AWS::EC2::Volume\n    Properties:\n      AvailabilityZone: !Select [0, !GetAZs '']\n      Size: 100\n      VolumeType: gp3\n      Encrypted: true\n      Tags:\n        - Key: Name\n          Value: OllamaVolume\n\n  OllamaInstance:\n    Type: AWS::EC2::Instance\n    Properties:\n      InstanceType: !Ref OllamaInstanceType\n      ImageId: ami-0261755bbcb8c4a84  # Amazon Linux 2 AMI - update as needed\n      SecurityGroupIds:\n        - !GetAtt OllamaSecurityGroup.GroupId\n      SubnetId: !Select [0, !Ref SubnetIds]\n      IamInstanceProfile: !Ref OllamaInstanceProfile\n      BlockDeviceMappings:\n        - DeviceName: /dev/xvda\n          Ebs:\n            VolumeSize: 30\n            VolumeType: gp3\n            DeleteOnTermination: true\n      UserData:\n        Fn::Base64: !Sub |\n          #!/bin/bash\n          # Install Docker\n          amazon-linux-extras install docker -y\n          systemctl start docker\n          systemctl enable docker\n          \n          # Install Ollama\n          curl -fsSL https://ollama.com/install.sh | sh\n          \n          # Run Ollama in Docker\n          docker run -d --name ollama \\\n            -p 11434:11434 \\\n            -v ollama:/root/.ollama \\\n            ollama/ollama\n          \n          # Pull models\n          docker exec ollama ollama pull llama2\n          docker exec ollama ollama pull mistral\n          docker exec ollama ollama pull codellama\n      Tags:\n        - Key: Name\n          Value: !Sub \"${AWS::StackName}-ollama\"\n\n  OllamaVolumeAttachment:\n    Type: AWS::EC2::VolumeAttachment\n    Properties:\n      InstanceId: !Ref OllamaInstance\n      VolumeId: !Ref OllamaEBSVolume\n      Device: /dev/sdf\n\n  # API ECS Cluster\n  ECSCluster:\n    Type: AWS::ECS::Cluster\n    Properties:\n      ClusterName: !Sub \"${AWS::StackName}-cluster\"\n      CapacityProviders:\n        - FARGATE\n      DefaultCapacityProviderStrategy:\n        - CapacityProvider: FARGATE\n          Weight: 1\n\n  APISecurityGroup:\n    Type: AWS::EC2::SecurityGroup\n    Properties:\n      GroupDescription: Security group for API ECS tasks\n      VpcId: !Ref VpcId\n      SecurityGroupIngress:\n        - IpProtocol: tcp\n          FromPort: 8000\n          ToPort: 8000\n          CidrIp: 0.0.0.0/0  # Restrict in production\n\n  # ECS Task Definition\n  ECSTaskExecutionRole:\n    Type: AWS::IAM::Role\n    Properties:\n      AssumeRolePolicyDocument:\n        Version: '2012-10-17'\n        Statement:\n          - Effect: Allow\n            Principal:\n              Service: ecs-tasks.amazonaws.com\n            Action: sts:AssumeRole\n      ManagedPolicyArns:\n        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy\n\n  ECSTaskRole:\n    Type: AWS::IAM::Role\n    Properties:\n      AssumeRolePolicyDocument:\n        Version: '2012-10-17'\n        Statement:\n          - Effect: Allow\n            Principal:\n              Service: ecs-tasks.amazonaws.com\n            Action: sts:AssumeRole\n      ManagedPolicyArns:\n        - arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess\n\n  APITaskDefinition:\n    Type: AWS::ECS::TaskDefinition\n    Properties:\n      Family: !Sub \"${AWS::StackName}-api\"\n      Cpu: '1024'\n      Memory: '2048'\n      NetworkMode: awsvpc\n      RequiresCompatibilities:\n        - FARGATE\n      ExecutionRoleArn: !GetAtt ECSTaskExecutionRole.Arn\n      TaskRoleArn: !GetAtt ECSTaskRole.Arn\n      ContainerDefinitions:\n        - Name: api\n          Image: !Sub \"${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ECRRepositoryName}:latest\"\n          Essential: true\n          PortMappings:\n            - ContainerPort: 8000\n          Environment:\n            - Name: REDIS_URL\n              Value: !Sub \"redis://${RedisCluster.RedisEndpoint.Address}:${RedisCluster.RedisEndpoint.Port}/0\"\n            - Name: OLLAMA_HOST\n              Value: !Sub \"http://${OllamaInstance.PrivateIp}:11434\"\n            - Name: APP_ENV\n              Value: !Ref Environment\n          LogConfiguration:\n            LogDriver: awslogs\n            Options:\n              awslogs-group: !Ref APILogGroup\n              awslogs-region: !Ref AWS::Region\n              awslogs-stream-prefix: api\n          HealthCheck:\n            Command:\n              - CMD-SHELL\n              - curl -f http://localhost:8000/api/health || exit 1\n            Interval: 30\n            Timeout: 5\n            Retries: 3\n\n  APILogGroup:\n    Type: AWS::Logs::LogGroup\n    Properties:\n      LogGroupName: !Sub \"/ecs/${AWS::StackName}-api\"\n      RetentionInDays: 7\n\n  # ECS Service\n  APIService:\n    Type: AWS::ECS::Service\n    Properties:\n      ServiceName: !Sub \"${AWS::StackName}-api\"\n      Cluster: !Ref ECSCluster\n      TaskDefinition: !Ref APITaskDefinition\n      DesiredCount: !Ref ApiInstanceCount\n      LaunchType: FARGATE\n      NetworkConfiguration:\n        AwsvpcConfiguration:\n          AssignPublicIp: ENABLED\n          SecurityGroups:\n            - !GetAtt APISecurityGroup.GroupId\n          Subnets: !Ref SubnetIds\n      LoadBalancers:\n        - TargetGroupArn: !Ref ALBTargetGroup\n          ContainerName: api\n          ContainerPort: 8000\n    DependsOn: ALBListener\n\n  # Application Load Balancer\n  ALB:\n    Type: AWS::ElasticLoadBalancingV2::LoadBalancer\n    Properties:\n      Name: !Sub \"${AWS::StackName}-alb\"\n      Type: application\n      Scheme: internet-facing\n      SecurityGroups:\n        - !GetAtt ALBSecurityGroup.GroupId\n      Subnets: !Ref SubnetIds\n      LoadBalancerAttributes:\n        - Key: idle_timeout.timeout_seconds\n          Value: '60'\n\n  ALBSecurityGroup:\n    Type: AWS::EC2::SecurityGroup\n    Properties:\n      GroupDescription: Security group for ALB\n      VpcId: !Ref VpcId\n      SecurityGroupIngress:\n        - IpProtocol: tcp\n          FromPort: 80\n          ToPort: 80\n          CidrIp: 0.0.0.0/0\n        - IpProtocol: tcp\n          FromPort: 443\n          ToPort: 443\n          CidrIp: 0.0.0.0/0\n\n  ALBTargetGroup:\n    Type: AWS::ElasticLoadBalancingV2::TargetGroup\n    Properties:\n      Name: !Sub \"${AWS::StackName}-target-group\"\n      Port: 8000\n      Protocol: HTTP\n      TargetType: ip\n      VpcId: !Ref VpcId\n      HealthCheckPath: /api/health\n      HealthCheckIntervalSeconds: 30\n      HealthCheckTimeoutSeconds: 5\n      HealthyThresholdCount: 3\n      UnhealthyThresholdCount: 3\n\n  ALBListener:\n    Type: AWS::ElasticLoadBalancingV2::Listener\n    Properties:\n      LoadBalancerArn: !Ref ALB\n      Port: 80\n      Protocol: HTTP\n      DefaultActions:\n        - Type: forward\n          TargetGroupArn: !Ref ALBTargetGroup\n\nOutputs:\n  APIEndpoint:\n    Description: URL for API\n    Value: !Sub \"http://${ALB.DNSName}\"\n  \n  OllamaEndpoint:\n    Description: Ollama Server Private IP\n    Value: !GetAtt OllamaInstance.PrivateIp\n  \n  ECRRepository:\n    Description: ECR Repository URL\n    Value: !Sub \"${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ECRRepositoryName}\"\n  \n  RedisEndpoint:\n    Description: Redis Endpoint\n    Value: !Sub \"${RedisCluster.RedisEndpoint.Address}:${RedisCluster.RedisEndpoint.Port}\"\n"])</script><script>self.__next_f.push([1,"176:[\"$\",\"pre\",\"pre-111\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$322\"}]}]\n177:[\"$\",\"h4\",\"h4-32\",{\"id\":\"aws-deployment-script\",\"children\":\"AWS Deployment Script\"}]\n323:Tc4a,"])</script><script>self.__next_f.push([1,"#!/bin/bash\n# aws_deploy.sh - AWS deployment script\n\nset -e  # Exit on error\n\n# Check required AWS CLI\nif ! command -v aws \u0026\u003e /dev/null; then\n    echo \"AWS CLI is required but not installed. Aborting.\"\n    exit 1\nfi\n\n# AWS configuration\nAWS_REGION=\"us-east-1\"\nSTACK_NAME=\"mcp-hybrid-system\"\nCFN_TEMPLATE=\"aws/cloudformation.yaml\"\nIMAGE_TAG=$(git rev-parse --short HEAD)\n\n# Check if stack exists\nif aws cloudformation describe-stacks --stack-name $STACK_NAME --region $AWS_REGION \u0026\u003e /dev/null; then\n    STACK_ACTION=\"update\"\nelse\n    STACK_ACTION=\"create\"\nfi\n\n# Deploy CloudFormation stack\nif [ \"$STACK_ACTION\" = \"create\" ]; then\n    echo \"Creating CloudFormation stack...\"\n    aws cloudformation create-stack \\\n        --stack-name $STACK_NAME \\\n        --template-body file://$CFN_TEMPLATE \\\n        --capabilities CAPABILITY_IAM \\\n        --parameters \\\n            ParameterKey=Environment,ParameterValue=Production \\\n            ParameterKey=OllamaInstanceType,ParameterValue=g4dn.xlarge \\\n            ParameterKey=ApiInstanceCount,ParameterValue=2 \\\n        --region $AWS_REGION\n    \n    # Wait for stack creation\n    echo \"Waiting for stack creation to complete...\"\n    aws cloudformation wait stack-create-complete \\\n        --stack-name $STACK_NAME \\\n        --region $AWS_REGION\nelse\n    echo \"Updating CloudFormation stack...\"\n    aws cloudformation update-stack \\\n        --stack-name $STACK_NAME \\\n        --template-body file://$CFN_TEMPLATE \\\n        --capabilities CAPABILITY_IAM \\\n        --parameters \\\n            ParameterKey=Environment,ParameterValue=Production \\\n            ParameterKey=OllamaInstanceType,ParameterValue=g4dn.xlarge \\\n            ParameterKey=ApiInstanceCount,ParameterValue=2 \\\n        --region $AWS_REGION\n    \n    # Wait for stack update\n    echo \"Waiting for stack update to complete...\"\n    aws cloudformation wait stack-update-complete \\\n        --stack-name $STACK_NAME \\\n        --region $AWS_REGION\nfi\n\n# Get stack outputs\necho \"Getting stack outputs...\"\nECR_REPOSITORY=$(aws cloudformation describe-stacks \\\n    --stack-name $STACK_NAME \\\n    --query \"Stacks[0].Outputs[?OutputKey=='ECRRepository'].OutputValue\" \\\n    --output text \\\n    --region $AWS_REGION)\n\nAPI_ENDPOINT=$(aws cloudformation describe-stacks \\\n    --stack-name $STACK_NAME \\\n    --query \"Stacks[0].Outputs[?OutputKey=='APIEndpoint'].OutputValue\" \\\n    --output text \\\n    --region $AWS_REGION)\n\n# Build and push Docker image\necho \"Building and pushing Docker image to ECR...\"\n# Login to ECR\naws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY\n\n# Build and push\ndocker build -t $ECR_REPOSITORY:$IMAGE_TAG -t $ECR_REPOSITORY:latest -f Dockerfile.prod .\ndocker push $ECR_REPOSITORY:$IMAGE_TAG\ndocker push $ECR_REPOSITORY:latest\n\n# Update ECS service to force deployment\necho \"Updating ECS service...\"\nECS_CLUSTER=\"${STACK_NAME}-cluster\"\nECS_SERVICE=\"${STACK_NAME}-api\"\n\naws ecs update-service \\\n    --cluster $ECS_CLUSTER \\\n    --service $ECS_SERVICE \\\n    --force-new-deployment \\\n    --region $AWS_REGION\n\necho \"Deployment complete!\"\necho \"API Endpoint: $API_ENDPOINT\"\n"])</script><script>self.__next_f.push([1,"178:[\"$\",\"pre\",\"pre-112\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"$323\"}]}]\n179:[\"$\",\"h1\",\"h1-7\",{\"id\":\"optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system-continued\",\"children\":\"Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System (Continued)\"}]\n17a:[\"$\",\"h2\",\"h2-63\",{\"id\":\"monitoring-and-observability-configuration\",\"children\":\"Monitoring and Observability Configuration\"}]\n17b:[\"$\",\"h3\",\"h3-65\",{\"id\":\"prometheus-and-grafana-setup-for-metrics\",\"children\":\"Prometheus and Grafana Setup for Metrics\"}]\n324:Td0e,"])</script><script>self.__next_f.push([1,"# monitoring/prometheus-config.yaml\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: prometheus-config\ndata:\n  prometheus.yml: |\n    global:\n      scrape_interval: 15s\n      evaluation_interval: 15s\n\n    scrape_configs:\n      - job_name: 'mcp-api'\n        metrics_path: /metrics\n        kubernetes_sd_configs:\n          - role: pod\n        relabel_configs:\n          - source_labels: [__meta_kubernetes_pod_label_app]\n            regex: mcp-api\n            action: keep\n\n      - job_name: 'ollama'\n        metrics_path: /metrics\n        static_configs:\n          - targets: ['ollama-service:11434']\n\n    alerting:\n      alertmanagers:\n        - static_configs:\n            - targets: ['alertmanager:9093']\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: prometheus\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: prometheus\n  template:\n    metadata:\n      labels:\n        app: prometheus\n    spec:\n      containers:\n        - name: prometheus\n          image: prom/prometheus:v2.42.0\n          ports:\n            - containerPort: 9090\n          volumeMounts:\n            - name: config-volume\n              mountPath: /etc/prometheus\n            - name: prometheus-data\n              mountPath: /prometheus\n          args:\n            - \"--config.file=/etc/prometheus/prometheus.yml\"\n            - \"--storage.tsdb.path=/prometheus\"\n            - \"--web.console.libraries=/usr/share/prometheus/console_libraries\"\n            - \"--web.console.templates=/usr/share/prometheus/consoles\"\n            - \"--web.enable-lifecycle\"\n      volumes:\n        - name: config-volume\n          configMap:\n            name: prometheus-config\n        - name: prometheus-data\n          persistentVolumeClaim:\n            claimName: prometheus-pvc\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: prometheus-service\nspec:\n  selector:\n    app: prometheus\n  ports:\n    - port: 9090\n      targetPort: 9090\n  type: ClusterIP\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: grafana\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: grafana\n  template:\n    metadata:\n      labels:\n        app: grafana\n    spec:\n      containers:\n        - name: grafana\n          image: grafana/grafana:9.4.7\n          ports:\n            - containerPort: 3000\n          volumeMounts:\n            - name: grafana-data\n              mountPath: /var/lib/grafana\n          env:\n            - name: GF_SECURITY_ADMIN_USER\n              valueFrom:\n                secretKeyRef:\n                  name: grafana-secrets\n                  key: admin_user\n            - name: GF_SECURITY_ADMIN_PASSWORD\n              valueFrom:\n                secretKeyRef:\n                  name: grafana-secrets\n                  key: admin_password\n      volumes:\n        - name: grafana-data\n          persistentVolumeClaim:\n            claimName: grafana-pvc\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: grafana-service\nspec:\n  selector:\n    app: grafana\n  ports:\n    - port: 3000\n      targetPort: 3000\n  type: ClusterIP\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: prometheus-pvc\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: grafana-pvc\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 5Gi\n"])</script><script>self.__next_f.push([1,"17c:[\"$\",\"pre\",\"pre-113\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$324\"}]}]\n17d:[\"$\",\"h3\",\"h3-66\",{\"id\":\"grafana-dashboard-configuration\",\"children\":\"Grafana Dashboard Configuration\"}]\n325:T34ac,"])</script><script>self.__next_f.push([1,"{\n  \"annotations\": {\n    \"list\": [\n      {\n        \"builtIn\": 1,\n        \"datasource\": \"-- Grafana --\",\n        \"enable\": true,\n        \"hide\": true,\n        \"iconColor\": \"rgba(0, 211, 255, 1)\",\n        \"name\": \"Annotations \u0026 Alerts\",\n        \"type\": \"dashboard\"\n      }\n    ]\n  },\n  \"editable\": true,\n  \"gnetId\": null,\n  \"graphTooltip\": 0,\n  \"id\": 1,\n  \"links\": [],\n  \"panels\": [\n    {\n      \"aliasColors\": {},\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {}\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 12,\n        \"x\": 0,\n        \"y\": 0\n      },\n      \"hiddenSeries\": false,\n      \"id\": 2,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.0\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"rate(api_requests_total[5m])\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Requests ({{provider}})\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Request Rate by Provider\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"format\": \"short\",\n          \"label\": \"Requests/sec\",\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {},\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {}\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 12,\n        \"x\": 12,\n        \"y\": 0\n      },\n      \"hiddenSeries\": false,\n      \"id\": 3,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.0\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"api_response_time_seconds{quantile=\\\"0.5\\\"}\",\n          \"interval\": \"\",\n          \"legendFormat\": \"50th % ({{provider}})\",\n          \"refId\": \"A\"\n        },\n        {\n          \"expr\": \"api_response_time_seconds{quantile=\\\"0.9\\\"}\",\n          \"interval\": \"\",\n          \"legendFormat\": \"90th % ({{provider}})\",\n          \"refId\": \"B\"\n        },\n        {\n          \"expr\": \"api_response_time_seconds{quantile=\\\"0.99\\\"}\",\n          \"interval\": \"\",\n          \"legendFormat\": \"99th % ({{provider}})\",\n          \"refId\": \"C\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Response Time by Provider\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"format\": \"s\",\n          \"label\": \"Response Time\",\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 80\n              }\n            ]\n          }\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 8,\n        \"x\": 0,\n        \"y\": 8\n      },\n      \"id\": 4,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"area\",\n        \"justifyMode\": \"auto\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.0\",\n      \"targets\": [\n        {\n          \"expr\": \"sum(api_requests_total{provider=\\\"openai\\\"})\",\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"OpenAI Total Requests\",\n      \"type\": \"stat\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 80\n              }\n            ]\n          }\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 8,\n        \"x\": 8,\n        \"y\": 8\n      },\n      \"id\": 5,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"area\",\n        \"justifyMode\": \"auto\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.0\",\n      \"targets\": [\n        {\n          \"expr\": \"sum(api_requests_total{provider=\\\"ollama\\\"})\",\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"Ollama Total Requests\",\n      \"type\": \"stat\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 80\n              }\n            ]\n          },\n          \"unit\": \"currencyUSD\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 8,\n        \"x\": 16,\n        \"y\": 8\n      },\n      \"id\": 6,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"area\",\n        \"justifyMode\": \"auto\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"sum\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.0\",\n      \"targets\": [\n        {\n          \"expr\": \"sum(api_openai_cost_total)\",\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"OpenAI Cost\",\n      \"type\": \"stat\"\n    },\n    {\n      \"aliasColors\": {},\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {}\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 12,\n        \"x\": 0,\n        \"y\": 16\n      },\n      \"hiddenSeries\": false,\n      \"id\": 7,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.0\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"rate(api_token_usage_total{type=\\\"prompt\\\"}[5m])\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Prompt ({{provider}})\",\n          \"refId\": \"A\"\n        },\n        {\n          \"expr\": \"rate(api_token_usage_total{type=\\\"completion\\\"}[5m])\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Completion ({{provider}})\",\n          \"refId\": \"B\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Token Usage Rate by Type\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"format\": \"short\",\n          \"label\": \"Tokens/sec\",\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {},\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {}\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 12,\n        \"x\": 12,\n        \"y\": 16\n      },\n      \"hiddenSeries\": false,\n      \"id\": 8,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.0\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"rate(api_cache_hits_total[5m])\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Cache Hits\",\n          \"refId\": \"A\"\n        },\n        {\n          \"expr\": \"rate(api_cache_misses_total[5m])\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Cache Misses\",\n          \"refId\": \"B\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Cache Performance\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"format\": \"short\",\n          \"label\": \"Rate\",\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    }\n  ],\n  \"refresh\": \"10s\",\n  \"schemaVersion\": 26,\n  \"style\": \"dark\",\n  \"tags\": [],\n  \"templating\": {\n    \"list\": []\n  },\n  \"time\": {\n    \"from\": \"now-6h\",\n    \"to\": \"now\"\n  },\n  \"timepicker\": {\n    \"refresh_intervals\": [\n      \"5s\",\n      \"10s\",\n      \"30s\",\n      \"1m\",\n      \"5m\",\n      \"15m\",\n      \"30m\",\n      \"1h\",\n      \"2h\",\n      \"1d\"\n    ]\n  },\n  \"timezone\": \"\",\n  \"title\": \"MCP Hybrid System Dashboard\",\n  \"uid\": \"mcp-dashboard\",\n  \"version\": 1\n}\n"])</script><script>self.__next_f.push([1,"17e:[\"$\",\"pre\",\"pre-114\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"$325\"}]}]\n17f:[\"$\",\"h3\",\"h3-67\",{\"id\":\"implementing-metrics-collection-in-api\",\"children\":\"Implementing Metrics Collection in API\"}]\n326:T9b4,"])</script><script>self.__next_f.push([1,"# app/middleware/metrics.py\nfrom fastapi import Request\nimport time\nfrom prometheus_client import Counter, Histogram, Gauge\nimport logging\n\n# Initialize metrics\nREQUEST_COUNT = Counter(\n    'api_requests_total', \n    'Total count of API requests',\n    ['method', 'endpoint', 'provider', 'model', 'status']\n)\n\nRESPONSE_TIME = Histogram(\n    'api_response_time_seconds',\n    'Response time in seconds',\n    ['method', 'endpoint', 'provider']\n)\n\nTOKEN_USAGE = Counter(\n    'api_token_usage_total',\n    'Total token usage',\n    ['provider', 'model', 'type']  # type: prompt or completion\n)\n\nOPENAI_COST = Counter(\n    'api_openai_cost_total',\n    'Total OpenAI API cost in USD',\n    ['model']\n)\n\nACTIVE_REQUESTS = Gauge(\n    'api_active_requests',\n    'Number of active requests',\n    ['method']\n)\n\nCACHE_HITS = Counter(\n    'api_cache_hits_total',\n    'Total cache hits',\n    ['cache_type']  # exact or semantic\n)\n\nCACHE_MISSES = Counter(\n    'api_cache_misses_total',\n    'Total cache misses',\n    []\n)\n\nlogger = logging.getLogger(__name__)\n\nasync def metrics_middleware(request: Request, call_next):\n    \"\"\"Middleware to collect metrics for API requests.\"\"\"\n    # Track active requests\n    ACTIVE_REQUESTS.labels(method=request.method).inc()\n    \n    # Start timing\n    start_time = time.time()\n    \n    # Default status code\n    status_code = 500\n    provider = \"unknown\"\n    model = \"unknown\"\n    \n    try:\n        # Process the request\n        response = await call_next(request)\n        status_code = response.status_code\n        \n        # Try to get provider and model from response headers if available\n        provider = response.headers.get(\"X-Provider\", \"unknown\")\n        model = response.headers.get(\"X-Model\", \"unknown\")\n        \n        return response\n    except Exception as e:\n        logger.exception(\"Unhandled exception in request\")\n        raise\n    finally:\n        # Calculate response time\n        response_time = time.time() - start_time\n        \n        # Record metrics\n        REQUEST_COUNT.labels(\n            method=request.method,\n            endpoint=request.url.path,\n            provider=provider,\n            model=model,\n            status=status_code\n        ).inc()\n        \n        RESPONSE_TIME.labels(\n            method=request.method,\n            endpoint=request.url.path,\n            provider=provider\n        ).observe(response_time)\n        \n        # Decrement active requests\n        ACTIVE_REQUESTS.labels(method=request.method).dec()\n"])</script><script>self.__next_f.push([1,"180:[\"$\",\"pre\",\"pre-115\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$326\"}]}]\n181:[\"$\",\"h2\",\"h2-64\",{\"id\":\"scaling-strategies\",\"children\":\"Scaling Strategies\"}]\n182:[\"$\",\"h3\",\"h3-68\",{\"id\":\"optimizing-ollama-scaling-for-high-loads\",\"children\":\"Optimizing Ollama Scaling for High Loads\"}]\n327:T188b,"])</script><script>self.__next_f.push([1,"# app/services/ollama_scaling.py\nimport logging\nimport asyncio\nimport time\nfrom typing import Dict, List, Any, Optional\nimport random\nimport httpx\n\nlogger = logging.getLogger(__name__)\n\nclass OllamaScalingService:\n    \"\"\"\n    Manages load balancing and scaling for multiple Ollama instances.\n    \"\"\"\n    \n    def __init__(self):\n        self.ollama_instances = []\n        self.instance_status = {}\n        self.model_availability = {}\n        self.health_check_interval = 60  # seconds\n        self.enable_scaling = False\n        self.min_instances = 1\n        self.max_instances = 5\n        self.health_check_task = None\n    \n    async def initialize(self, instances: List[str]):\n        \"\"\"Initialize the service with a list of Ollama instances.\"\"\"\n        self.ollama_instances = instances\n        self.instance_status = {instance: False for instance in instances}\n        self.model_availability = {instance: [] for instance in instances}\n        \n        # Start health checking\n        self.health_check_task = asyncio.create_task(self._health_check_loop())\n        \n        # Perform initial health check\n        await self._check_all_instances()\n        \n        logger.info(f\"Initialized Ollama scaling with {len(instances)} instances\")\n    \n    async def shutdown(self):\n        \"\"\"Shutdown the service.\"\"\"\n        if self.health_check_task:\n            self.health_check_task.cancel()\n            try:\n                await self.health_check_task\n            except asyncio.CancelledError:\n                pass\n    \n    async def _health_check_loop(self):\n        \"\"\"Periodically check health of all instances.\"\"\"\n        while True:\n            try:\n                await self._check_all_instances()\n                await asyncio.sleep(self.health_check_interval)\n            except asyncio.CancelledError:\n                break\n            except Exception as e:\n                logger.error(f\"Error in health check loop: {str(e)}\")\n                await asyncio.sleep(5)  # Shorter retry on error\n    \n    async def _check_all_instances(self):\n        \"\"\"Check health and model availability for all instances.\"\"\"\n        tasks = []\n        for instance in self.ollama_instances:\n            tasks.append(self._check_instance(instance))\n        \n        # Run all checks in parallel\n        await asyncio.gather(*tasks, return_exceptions=True)\n        \n        # Log status\n        healthy_count = sum(1 for status in self.instance_status.values() if status)\n        logger.debug(f\"Ollama health check: {healthy_count}/{len(self.ollama_instances)} instances healthy\")\n    \n    async def _check_instance(self, instance: str):\n        \"\"\"Check health and model availability for a single instance.\"\"\"\n        try:\n            async with httpx.AsyncClient(timeout=5.0) as client:\n                response = await client.get(f\"{instance}/api/version\")\n                \n                if response.status_code == 200:\n                    # Instance is healthy\n                    self.instance_status[instance] = True\n                    \n                    # Check available models\n                    models_response = await client.get(f\"{instance}/api/tags\")\n                    if models_response.status_code == 200:\n                        data = models_response.json()\n                        models = [model[\"name\"] for model in data.get(\"models\", [])]\n                        self.model_availability[instance] = models\n                else:\n                    self.instance_status[instance] = False\n        except Exception as e:\n            logger.warning(f\"Health check failed for {instance}: {str(e)}\")\n            self.instance_status[instance] = False\n    \n    def get_instance_for_model(self, model: str) -\u003e Optional[str]:\n        \"\"\"Get the best instance for a specific model.\"\"\"\n        # Filter to healthy instances that have the model\n        candidates = [\n            instance for instance, status in self.instance_status.items()\n            if status and model in self.model_availability.get(instance, [])\n        ]\n        \n        if not candidates:\n            return None\n        \n        # Use random selection for basic load balancing\n        # A more sophisticated version would track load, response times, etc.\n        return random.choice(candidates)\n    \n    def get_healthy_instance(self) -\u003e Optional[str]:\n        \"\"\"Get any healthy instance.\"\"\"\n        candidates = [\n            instance for instance, status in self.instance_status.items()\n            if status\n        ]\n        \n        if not candidates:\n            return None\n            \n        return random.choice(candidates)\n    \n    async def ensure_model_availability(self, model: str) -\u003e bool:\n        \"\"\"\n        Ensure at least one instance has the required model.\n        Returns True if model is available or successfully pulled.\n        \"\"\"\n        # Check if any instance already has this model\n        for instance, models in self.model_availability.items():\n            if self.instance_status.get(instance, False) and model in models:\n                return True\n        \n        # Try to pull the model on a healthy instance\n        instance = self.get_healthy_instance()\n        if not instance:\n            logger.error(f\"No healthy Ollama instances available to pull model {model}\")\n            return False\n        \n        # Try to pull the model\n        try:\n            async with httpx.AsyncClient(timeout=300.0) as client:  # Longer timeout for model pull\n                response = await client.post(\n                    f\"{instance}/api/pull\",\n                    json={\"name\": model}\n                )\n                \n                if response.status_code == 200:\n                    logger.info(f\"Successfully pulled model {model} on {instance}\")\n                    # Update model availability\n                    if instance in self.model_availability:\n                        self.model_availability[instance].append(model)\n                    return True\n                else:\n                    logger.error(f\"Failed to pull model {model} on {instance}: {response.text}\")\n                    return False\n        except Exception as e:\n            logger.error(f\"Error pulling model {model} on {instance}: {str(e)}\")\n            return False\n"])</script><script>self.__next_f.push([1,"183:[\"$\",\"pre\",\"pre-116\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$327\"}]}]\n184:[\"$\",\"h3\",\"h3-69\",{\"id\":\"autoscaling-configuration-for-cloud-deployments\",\"children\":\"Autoscaling Configuration for Cloud Deployments\"}]\n328:T47c,# kubernetes/autoscaler-config.yaml\napiVersion: autoscaling.k8s.io/v1\nkind: VerticalPodAutoscaler\nmetadata:\n  name: mcp-api-vpa\nspec:\n  targetRef:\n    apiVersion: \"apps/v1\"\n    kind: Deployment\n    name: mcp-api\n  updatePolicy:\n    updateMode: \"Auto\"\n  resourcePolicy:\n    containerPolicies:\n      - containerName: '*'\n        minAllowed:\n          cpu: 250m\n          memory: 256Mi\n        maxAllowed:\n          cpu: 2000m\n          memory: 4Gi\n        controlledResources: [\"cpu\", \"memory\"]\n---\napiVersion: keda.sh/v1alpha1\nkind: ScaledObject\nmetadata:\n  name: mcp-api-scaler\nspec:\n  scaleTargetRef:\n    name: mcp-api\n  minReplicaCount: 2\n  maxReplicaCount: 20\n  pollingInterval: 15\n  cooldownPeriod: 300\n  triggers:\n  - type: prometheus\n    metadata:\n      serverAddress: http://prometheus-service:9090\n      metricName: api_active_requests\n      threshold: '10'\n      query: sum(api_active_requests)\n  - type: prometheus\n    metadata:\n      serverAddress: http://prometheus-service:9090\n      metricName: api_response_time_p90\n      threshold: '2.0'\n      query: histogram_quantile(0.9, sum(rate(api_response_time_seconds_bucket[2m])) by (le))\n185:[\"$\",\"pre\",\"pre-117\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$328\"}]}]\n186:[\"$\",\"h2\",\"h2-65\",{\"id\":\"cost-optimization---monthly-budget-tracking\",\"children\":\"Cost Optimization - Monthly Budget Tracking\"}]\n329:T1efa,"])</script><script>self.__next_f.push([1,"# app/services/budget_service.py\nimport logging\nimport time\nfrom datetime import datetime, timedelta\nimport aioredis\nimport json\nfrom typing import Dict, Any, Optional\n\nlogger = logging.getLogger(__name__)\n\nclass BudgetService:\n    \"\"\"\n    Manages API budget tracking and quota enforcement.\n    \"\"\"\n    \n    def __init__(self, redis_url: str):\n        self.redis = None\n        self.redis_url = redis_url\n        self.monthly_budget = 0.0\n        self.daily_budget = 0.0\n        self.alert_threshold = 0.8  # Alert at 80% of budget\n        self.budget_lock_key = \"budget:lock\"\n        self.last_reset_check = 0\n    \n    async def initialize(self, monthly_budget: float = 0.0):\n        \"\"\"Initialize the budget service.\"\"\"\n        self.redis = await aioredis.create_redis_pool(self.redis_url)\n        self.monthly_budget = monthly_budget\n        self.daily_budget = monthly_budget / 30 if monthly_budget \u003e 0 else 0\n        \n        # Initialize monthly budget in Redis if not already set\n        if not await self.redis.exists(\"budget:monthly:total\"):\n            await self.redis.set(\"budget:monthly:total\", str(monthly_budget))\n        \n        # Initialize current usage if not already set\n        if not await self.redis.exists(\"budget:monthly:used\"):\n            await self.redis.set(\"budget:monthly:used\", \"0.0\")\n        \n        # Set the reset day (1st of month)\n        if not await self.redis.exists(\"budget:reset_day\"):\n            await self.redis.set(\"budget:reset_day\", \"1\")\n        \n        # Check if we need to reset the budget\n        await self._check_budget_reset()\n        \n        logger.info(f\"Budget service initialized with monthly budget: ${monthly_budget:.2f}\")\n    \n    async def close(self):\n        \"\"\"Close the Redis connection.\"\"\"\n        if self.redis:\n            self.redis.close()\n            await self.redis.wait_closed()\n    \n    async def _check_budget_reset(self):\n        \"\"\"Check if the budget needs to be reset (new month).\"\"\"\n        now = time.time()\n        # Only check once per hour to avoid excessive checks\n        if now - self.last_reset_check \u003c 3600:\n            return\n            \n        self.last_reset_check = now\n        \n        try:\n            # Try to acquire lock to avoid multiple resets\n            lock = await self.redis.set(\n                self.budget_lock_key, \"1\", \n                expire=60, exist=\"SET_IF_NOT_EXIST\"\n            )\n            \n            if not lock:\n                return  # Another process is handling reset\n            \n            # Get the reset day (default to 1st of month)\n            reset_day = int(await self.redis.get(\"budget:reset_day\") or \"1\")\n            \n            # Get last reset timestamp\n            last_reset = float(await self.redis.get(\"budget:last_reset\") or \"0\")\n            \n            # Check if we're in a new month since last reset\n            last_reset_date = datetime.fromtimestamp(last_reset)\n            now_date = datetime.now()\n            \n            # If it's a new month and we've passed the reset day\n            if (now_date.year \u003e last_reset_date.year or \n                (now_date.year == last_reset_date.year and now_date.month \u003e last_reset_date.month)) and \\\n                now_date.day \u003e= reset_day:\n                \n                # Reset monthly usage\n                await self.redis.set(\"budget:monthly:used\", \"0.0\")\n                \n                # Update last reset timestamp\n                await self.redis.set(\"budget:last_reset\", str(now))\n                \n                # Log the reset\n                logger.info(\"Monthly budget reset performed\")\n                \n                # Archive previous month's usage for reporting\n                prev_month = last_reset_date.strftime(\"%Y-%m\")\n                prev_usage = await self.redis.get(\"budget:monthly:used\") or \"0.0\"\n                await self.redis.set(f\"budget:archive:{prev_month}\", prev_usage)\n        finally:\n            # Release lock\n            await self.redis.delete(self.budget_lock_key)\n    \n    async def record_usage(self, cost: float, provider: str, model: str):\n        \"\"\"Record API usage cost.\"\"\"\n        if cost \u003c= 0:\n            return\n            \n        # Only track costs for OpenAI\n        if provider != \"openai\":\n            return\n        \n        # Check if we need to reset first\n        await self._check_budget_reset()\n        \n        # Update monthly usage\n        await self.redis.incrbyfloat(\"budget:monthly:used\", cost)\n        \n        # Update model-specific usage\n        await self.redis.incrbyfloat(f\"budget:model:{model}\", cost)\n        \n        # Update daily usage\n        today = datetime.now().strftime(\"%Y-%m-%d\")\n        await self.redis.incrbyfloat(f\"budget:daily:{today}\", cost)\n        \n        # Log high-cost operations\n        if cost \u003e 0.1:  # Log individual requests that cost more than 10 cents\n            logger.info(f\"High-cost API request: ${cost:.4f} for {provider}:{model}\")\n            \n        # Check if we've exceeded the alert threshold\n        usage = float(await self.redis.get(\"budget:monthly:used\") or \"0\")\n        budget = float(await self.redis.get(\"budget:monthly:total\") or \"0\")\n        \n        if budget \u003e 0 and usage \u003e= budget * self.alert_threshold:\n            # Check if we've already alerted for this threshold\n            alerted = await self.redis.get(f\"budget:alerted:{int(self.alert_threshold * 100)}\")\n            \n            if not alerted:\n                percentage = (usage / budget) * 100\n                logger.warning(f\"Budget alert: Used ${usage:.2f} of ${budget:.2f} ({percentage:.1f}%)\")\n                \n                # Mark as alerted for this threshold\n                await self.redis.set(\n                    f\"budget:alerted:{int(self.alert_threshold * 100)}\", \"1\",\n                    expire=86400  # Expire after 1 day\n                )\n    \n    async def check_budget_available(self, estimated_cost: float) -\u003e bool:\n        \"\"\"\n        Check if there's enough budget for an estimated operation.\n        Returns True if operation is allowed, False if it would exceed budget.\n        \"\"\"\n        if estimated_cost \u003c= 0:\n            return True\n            \n        if self.monthly_budget \u003c= 0:\n            return True  # No budget constraints\n        \n        # Get current usage\n        usage = float(await self.redis.get(\"budget:monthly:used\") or \"0\")\n        budget = float(await self.redis.get(\"budget:monthly:total\") or \"0\")\n        \n        # Check if operation would exceed budget\n        return (usage + estimated_cost) \u003c= budget\n    \n    async def get_usage_stats(self) -\u003e Dict[str, Any]:\n        \"\"\"Get current budget usage statistics.\"\"\"\n        usage = float(await self.redis.get(\"budget:monthly:used\") or \"0\")\n        budget = float(await self.redis.get(\"budget:monthly:total\") or \"0\")\n        \n        # Get daily usage for the last 30 days\n        daily_usage = {}\n        today = datetime.now()\n        \n        for i in range(30):\n            date = (today - timedelta(days=i)).strftime(\"%Y-%m-%d\")\n            day_usage = float(await self.redis.get(f\"budget:daily:{date}\") or \"0\")\n            daily_usage[date] = day_usage\n        \n        # Get usage by model\n        model_keys = await self.redis.keys(\"budget:model:*\")\n        model_usage = {}\n        \n        for key in model_keys:\n            model = key.decode('utf-8').replace(\"budget:model:\", \"\")\n            model_cost = float(await self.redis.get(key) or \"0\")\n            model_usage[model] = model_cost\n        \n        # Calculate percentage used\n        percentage_used = (usage / budget) * 100 if budget \u003e 0 else 0\n        \n        return {\n            \"current_usage\": usage,\n            \"monthly_budget\": budget,\n            \"percentage_used\": percentage_used,\n            \"daily_usage\": daily_usage,\n            \"model_usage\": model_usage,\n            \"remaining_budget\": budget - usage if budget \u003e 0 else 0\n        }\n"])</script><script>self.__next_f.push([1,"187:[\"$\",\"pre\",\"pre-118\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$329\"}]}]\n188:[\"$\",\"h2\",\"h2-66\",{\"id\":\"conclusion-5\",\"children\":\"Conclusion\"}]\n189:[\"$\",\"p\",\"p-57\",{\"children\":\"The optimization and deployment strategies outlined in this document provide a comprehensive framework for implementing an efficient, cost-effective, and highly accurate hybrid AI system that leverages both OpenAI's cloud capabilities and Ollama's local inference.\"}]\n18a:[\"$\",\"p\",\"p-58\",{\"children\":\"Key aspects of this implementation include:\"}]\n"])</script><script>self.__next_f.push([1,"18b:[\"$\",\"ol\",\"ol-18\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Performance Optimization\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Query routing optimization based on complexity analysis\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Semantic response caching for frequent queries\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Parallel processing for complex queries\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Dynamic batching for high-load scenarios\"}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":\"Model-specific prompt optimization\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Cost Reduction\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Intelligent token usage optimization\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Tiered model selection based on task requirements\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Local model prioritization for development\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Request batching and rate limiting\"}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":\"Memory and context compression\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response Accuracy\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Advanced prompt templating for different scenarios\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Chain-of-thought reasoning for complex queries\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Self-verification and error correction\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Domain-specific knowledge integration\"}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":\"Dynamic few-shot learning with examples\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Deployment Options\"}],\":\"]}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Local development environment with Docker Compose\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Production Kubernetes deployment with autoscaling\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"AWS cloud deployment with CloudFormation\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Comprehensive monitoring with Prometheus and Grafana\"}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":\"Budget tracking and cost optimization\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"18c:[\"$\",\"p\",\"p-59\",{\"children\":\"These strategies work in concert to create a system that intelligently balances the tradeoffs between performance, cost, and accuracy, adapting to specific requirements and constraints in different deployment scenarios.\"}]\n18d:[\"$\",\"p\",\"p-60\",{\"children\":\"By implementing this hybrid approach, organizations can significantly reduce API costs while maintaining high quality responses, with the added benefits of enhanced privacy for sensitive data and reduced dependency on external services. The local inference capabilities also provide resilience against API outages and rate limiting, ensuring consistent service availability.\"}]\n18e:[\"$\",\"h1\",\"h1-8\",{\"id\":\"mcp-modern-computational-paradigm-system\",\"children\":\"MCP (Modern Computational Paradigm) System\"}]\n18f:[\"$\",\"h2\",\"h2-67\",{\"id\":\"comprehensive-documentation\",\"children\":\"Comprehensive Documentation\"}]\n190:[\"$\",\"p\",\"p-61\",{\"children\":\"This documentation provides a complete guide to understanding, installing, configuring, and using the MCP system - a hybrid architecture that integrates OpenAI's API capabilities with Ollama's local inference to create an optimized, cost-effective AI solution.\"}]\n191:[\"$\",\"hr\",\"hr-0\",{}]\n192:[\"$\",\"h1\",\"h1-9\",{\"id\":\"table-of-contents\",\"children\":\"Table of Contents\"}]\n"])</script><script>self.__next_f.push([1,"193:[\"$\",\"ol\",\"ol-19\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#introduction\",\"children\":\"Introduction\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#system-architecture\",\"children\":\"System Architecture\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#installation-guide\",\"children\":\"Installation Guide\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#prerequisites\",\"children\":\"Prerequisites\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#local-development-setup\",\"children\":\"Local Development Setup\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#docker-deployment\",\"children\":\"Docker Deployment\"}]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#kubernetes-deployment\",\"children\":\"Kubernetes Deployment\"}]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#aws-deployment\",\"children\":\"AWS Deployment\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#configuration\",\"children\":\"Configuration\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#environment-variables\",\"children\":\"Environment Variables\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#advanced-configuration\",\"children\":\"Advanced Configuration\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#model-selection\",\"children\":\"Model Selection\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#api-reference\",\"children\":\"API Reference\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#authentication\",\"children\":\"Authentication\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#chat-endpoints\",\"children\":\"Chat Endpoints\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#agent-endpoints\",\"children\":\"Agent Endpoints\"}]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#model-management-endpoints\",\"children\":\"Model Management Endpoints\"}]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#system-endpoints\",\"children\":\"System Endpoints\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-5\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#usage-examples\",\"children\":\"Usage Examples\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#basic-chat-interaction\",\"children\":\"Basic Chat Interaction\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#working-with-agents\",\"children\":\"Working with Agents\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#customizing-model-selection\",\"children\":\"Customizing Model Selection\"}]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#tool-integration\",\"children\":\"Tool Integration\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-6\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#performance-optimization\",\"children\":\"Performance Optimization\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#caching-strategies\",\"children\":\"Caching Strategies\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#query-optimization\",\"children\":\"Query Optimization\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#parallel-processing\",\"children\":\"Parallel Processing\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-7\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#cost-optimization\",\"children\":\"Cost Optimization\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#budget-management\",\"children\":\"Budget Management\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#provider-selection\",\"children\":\"Provider Selection\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#token-optimization\",\"children\":\"Token Optimization\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-8\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#monitoring-and-observability\",\"children\":\"Monitoring and Observability\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#metrics-overview\",\"children\":\"Metrics Overview\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#grafana-dashboard\",\"children\":\"Grafana Dashboard\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#alerting\",\"children\":\"Alerting\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-9\",{\"children\":[[\"$\",\"a\",\"a-0\",{\"href\":\"#troubleshooting\",\"children\":\"Troubleshooting\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#common-issues\",\"children\":\"Common Issues\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#diagnostics\",\"children\":\"Diagnostics\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#log-management\",\"children\":\"Log Management\"}]}],\"\\n\"]}],\"\\n\"]}],\"\\n\",\"$L32a\",\"\\n\",\"$L32b\",\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"194:[\"$\",\"hr\",\"hr-1\",{}]\n195:[\"$\",\"h1\",\"h1-10\",{\"id\":\"readmemd\",\"children\":\"README.md\"}]\n32c:T54e,# MCP - Modern Computational Paradigm\n\n![MCP Status](https://img.shields.io/badge/status-stable-green)\n![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)\n![License MIT](https://img.shields.io/badge/license-MIT-green.svg)\n\nMCP is a hybrid AI system that intelligently integrates OpenAI's cloud capabilities with Ollama's local inference. This architecture optimizes for cost, performance, and privacy while maintaining response quality.\n\n## Key Features\n\n- **Intelligent Query Routing**: Automatically selects between OpenAI and Ollama based on query complexity, privacy requirements, and performance needs\n- **Advanced Agent Framework**: Configurable AI agents with specialized capabilities\n- **Cost Optimization**: Reduces API costs by up to 70% through local model usage, caching, and token optimization\n- **Privacy Control**: Keeps sensitive information local when appropriate\n- **Performance Optimization**: Parallel processing, response caching, and dynamic batching for high throughput\n- **Comprehensive Monitoring**: Built-in metrics and observability\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.11+\n- Docker and Docker Compose (for containerized deployment)\n- Ollama (for local model inference)\n- OpenAI API key\n\n### Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/yourusername/mcp.git\n   cd mcp\n196:[\"$\",\"pre\",\"pre-119\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-markdown\",\"children\":\"$32c\"}]}]\n"])</script><script>self.__next_f.push([1,"197:[\"$\",\"ol\",\"ol-20\",{\"start\":2,\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":\"Create and activate a virtual environment:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python -m venv venv\\nsource venv/bin/activate  # On Windows: venv\\\\Scripts\\\\activate\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":\"Install dependencies:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"pip install -r requirements.txt\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":\"Set up environment variables:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"cp .env.example .env\\n# Edit .env with your configuration\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":\"Start Ollama (if not already running):\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama serve\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":\"Start the application:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"uvicorn app.main:app --reload\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"198:[\"$\",\"p\",\"p-62\",{\"children\":[\"The API will be available at \",[\"$\",\"a\",\"a-0\",{\"href\":\"http://localhost:8000\",\"children\":\"http://localhost:8000\"}],\".\"]}]\n199:[\"$\",\"h3\",\"h3-70\",{\"id\":\"docker-deployment\",\"children\":\"Docker Deployment\"}]\n19a:[\"$\",\"p\",\"p-63\",{\"children\":\"For containerized deployment:\"}]\n19b:[\"$\",\"pre\",\"pre-120\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"docker-compose up -d\\n\"}]}]\n19c:[\"$\",\"h2\",\"h2-68\",{\"id\":\"documentation\",\"children\":\"Documentation\"}]\n19d:[\"$\",\"p\",\"p-64\",{\"children\":\"For complete documentation, see:\"}]\n19e:[\"$\",\"ul\",\"ul-9\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"docs/installation.md\",\"children\":\"Installation Guide\"}]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"docs/api-reference.md\",\"children\":\"API Reference\"}]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"docs/configuration.md\",\"children\":\"Configuration Guide\"}]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"docs/troubleshooting.md\",\"children\":\"Troubleshooting\"}]}],\"\\n\"]}]\n19f:[\"$\",\"h2\",\"h2-69\",{\"id\":\"architecture\",\"children\":\"Architecture\"}]\n1a0:[\"$\",\"p\",\"p-65\",{\"children\":\"MCP uses a sophisticated routing architecture to determine the optimal inference provider for each request:\"}]\n"])</script><script>self.__next_f.push([1,"1a1:[\"$\",\"pre\",\"pre-121\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"┌─────────────────┐     ┌──────────────────┐     ┌─────────────┐\\n│                 │     │                  │     │             │\\n│  Client Request │────▶│ Routing Decision │────▶│ OpenAI API  │\\n│                 │     │                  │     │             │\\n└─────────────────┘     └──────────────────┘     └─────────────┘\\n                                │\\n                                │\\n                                ▼\\n                        ┌─────────────┐\\n                        │             │\\n                        │  Ollama API │\\n                        │             │\\n                        └─────────────┘\\n\"}],\"position\":{\"start\":{\"line\":14755,\"column\":1,\"offset\":543936},\"end\":{\"line\":14769,\"column\":4,\"offset\":544570}}},\"children\":\"┌─────────────────┐     ┌──────────────────┐     ┌─────────────┐\\n│                 │     │                  │     │             │\\n│  Client Request │────▶│ Routing Decision │────▶│ OpenAI API  │\\n│                 │     │                  │     │             │\\n└─────────────────┘     └──────────────────┘     └─────────────┘\\n                                │\\n                                │\\n                                ▼\\n                        ┌─────────────┐\\n                        │             │\\n                        │  Ollama API │\\n                        │             │\\n                        └─────────────┘\\n\"}]}]\n"])</script><script>self.__next_f.push([1,"1a2:[\"$\",\"h2\",\"h2-70\",{\"id\":\"license\",\"children\":\"License\"}]\n1a3:[\"$\",\"p\",\"p-66\",{\"children\":[\"MIT License - see \",[\"$\",\"a\",\"a-0\",{\"href\":\"LICENSE\",\"children\":\"LICENSE\"}],\" for details.\"]}]\n1a4:[\"$\",\"h2\",\"h2-71\",{\"id\":\"contributing\",\"children\":\"Contributing\"}]\n1a5:[\"$\",\"p\",\"p-67\",{\"children\":[\"Contributions are welcome! Please see \",[\"$\",\"a\",\"a-0\",{\"href\":\"CONTRIBUTING.md\",\"children\":\"CONTRIBUTING.md\"}],\" for details.\"]}]\n32d:T451,\n---\n\n# Installation Guide\n\n## Prerequisites\n\nBefore installing the MCP system, ensure your environment meets the following requirements:\n\n### System Requirements\n\n- **Operating System**: Linux (recommended), macOS, or Windows\n- **CPU**: 4+ cores recommended\n- **RAM**: Minimum 8GB, 16GB+ recommended\n- **Disk Space**: 10GB minimum for installation, 50GB+ recommended for model storage\n- **GPU**: Optional but recommended for Ollama (NVIDIA with CUDA support)\n\n### Software Requirements\n\n- **Python**: Version 3.11 or higher\n- **Docker**: Version 20.10 or higher (for containerized deployment)\n- **Docker Compose**: Version 2.0 or higher\n- **Kubernetes**: Version 1.21+ (for Kubernetes deployment)\n- **Ollama**: Latest version (for local model inference)\n- **Redis**: Version 6.0+ (for caching and rate limiting)\n\n### Required API Keys\n\n- **OpenAI API Key**: Register at [OpenAI Platform](https://platform.openai.com/)\n\n## Local Development Setup\n\nFollow these steps to set up a local development environment:\n\n### 1. Clone the Repository\n\n```bash\ngit clone https://github.com/yourusername/mcp.git\ncd mcp\n32e:T451,\n---\n\n# Installation Guide\n\n## Prerequisites\n\nBefore installing the MCP system, ensure your environment meets the following requirements:\n\n### System Requirements\n\n- **Operating System**: Linux (recommended), macOS, or Windows\n- **CPU**: 4+ cores recommended\n- **RAM**: Minimum 8GB, 16GB+ recommended\n- **Disk Space**: 10GB minimum for installation, 50GB+ recommended for model storage\n- **GPU**: Optional but recommended for Ollama (NVIDIA with CUDA support)\n\n### Software Requirements\n\n- **Python**: Version 3.11 or higher\n- **Docker**: Version 20.10 or higher (for containerized deployment)\n- **Docker Compose**: Version 2.0 or higher\n- **Kubernetes**: Version 1.21+ (for Kubernetes deployment)\n- **Ollama**: Latest version (for local model inference)\n- **Redis**: Version 6.0+ (for caching and rate limiting)\n\n### Required API Keys\n\n- **OpenAI API Key**: Register at [OpenAI Platform](https://platform.openai.com/)\n\n## Local Development Setup\n\nFollow these steps to set up a local development environment:\n\n### 1. Clone the Repository\n\n```bash\ngit clone https://github.com/yourusername/mcp.git\ncd mcp\n1a6:[\"$\",\"pre\",\"pre-122\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$32d\"}],\"position\":{\"start\":{\"line\":14778,\"column\":1,\"offset\":544738},\"end\":{\"line\":14818,\"column\":4,\"offset\":545850}}},\"children\":\"$32e\"}]}]\n1a7:[\"$\",\"h3\",\"h3-71\",{\"id\":\"2-set-up-virtual-environment\",\"children\":\"2. Set Up Virtual Environment\"}]\n1a8:[\"$\",\"pre\",\"pre-123\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Create virtual environment\\npython -m venv venv\\n\\n# Activate virtual environment\\n# On Linux/macOS:\\nsource venv/bin/activate\\n# On Windows:\\nvenv\\\\Scripts\\\\activate\\n\"}]}]\n1a9:[\"$\",\"h3\",\"h3-72\",{\"id\":\"3-install-dependencies\",\"children\":\"3. Install Dependencies\"}]\n1aa:[\"$\",\"pre\",\"pre-124\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"pip install --upgrade pip\\npip install -r requirements.txt\\npip install -r requirements-dev.txt  # For development tools\\n\"}]}]\n1ab:[\"$\",\"h3\",\"h3-73\",{\"id\":\"4-install-and-configure-ollama\",\"children\":\"4. Install and Configure Ollama\"}]\n1ac:[\"$\",\"pre\",\"pre-125\",{\"className\":\"bg-secondary border border-border roun"])</script><script>self.__next_f.push([1,"ded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# macOS (using Homebrew)\\nbrew install ollama\\n\\n# Linux\\ncurl -fsSL https://ollama.com/install.sh | sh\\n\\n# Start Ollama service\\nollama serve\\n\"}]}]\n1ad:[\"$\",\"h3\",\"h3-74\",{\"id\":\"5-pull-required-models\",\"children\":\"5. Pull Required Models\"}]\n1ae:[\"$\",\"pre\",\"pre-126\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Pull basic models\\nollama pull llama2\\nollama pull mistral\\nollama pull codellama\\n\"}]}]\n1af:[\"$\",\"h3\",\"h3-75\",{\"id\":\"6-set-up-environment-variables\",\"children\":\"6. Set Up Environment Variables\"}]\n1b0:[\"$\",\"pre\",\"pre-127\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Copy the example environment file\\ncp .env.example .env\\n\\n# Edit the file with your configuration\\n# At minimum, set OPENAI_API_KEY\\nnano .env\\n\"}]}]\n1b1:[\"$\",\"h3\",\"h3-76\",{\"id\":\"7-initialize-local-services\",\"children\":\"7. Initialize Local Services\"}]\n1b2:[\"$\",\"pre\",\"pre-128\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Start Redis using Docker\\ndocker-compose up -d redis\\n\\n# Initialize database (if applicable)\\npython scripts/init_db.py\\n\"}]}]\n1b3:[\"$\",\"h3\",\"h3-77\",{\"id\":\"8-start-development-server\",\"children\":\"8. Start Development Server\"}]\n1b4:[\"$\",\"pre\",\"pre-129\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Start with auto-reload for development\\nuvicorn app.main:app --reload --port 8000\\n\"}]}]\n1b5:[\"$\",\"h3\",\"h3-78\",{\"id\":\"9-verify-installation\",\"children\":\"9. Verify Installation\"}]\n1b6:[\"$\",\"p\",\"p-68\",{\"children\":\"Open your browser and navigate to:\"}]\n1b7:[\"$\",\"ul\",\"ul-10\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"API documentation: \",[\"$\",\"a\",\"a-0\",{\"href\":\"http://localhost:8000/docs\",\"children\":\"http://localhost:8000/docs\"}]]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"Health check: \",[\"$\",\"a\",\"a-0\",{\"href\":\"http://localhost:8000/api/health\",\"children\":\"http://localhost:8000/api/health\"}]]}],\"\\n\"]}]\n1b8:[\"$\",\"h2\",\"h2-72\",{\"id\":\"docker-deployment-1\",\"children\":\"Docker Deployment\"}]\n1b9:[\"$\",\"p\",\"p-69\",{\"children\":\"For a containerized deployment using Docker Compose:\"}]\n1ba:[\"$\",\"h3\",\"h3-79\",{\"id\":\"1-ensure-docker-and-docker-compose-are-installed\",\"children\":\"1. Ensure Docker and Docker Compose are Installed\"}]\n1bb:[\"$\",\"pre\",\"pre-130\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Verify installation\\ndocker --version\\ndocker-compose --version\\n\"}]}]\n1bc:[\"$\",\"h3\",\"h3-80\",{\"id\":\"2-configure-environment-variables\",\"children\":\"2. Configure Environment Variables\"}]\n1bd:[\"$\",\"pre\",\"pre-131\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Copy and edit environment variables\\ncp .env.example .env\\nnano .env\\n\"}]}]\n1be:[\"$\",\"h3\",\"h3-81\",{\"id\":\"3-start-services-with-docker-compose\",\"children\":\"3. Start Services with Docker Compose\"}]\n1bf:[\"$\",\"pre\",\"pre-132\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Build and start all services\\ndocker-compose up -d\\n\\n# View logs\\ndocker-compose logs -f\\n\"}]}]\n1c0:[\"$\",\"p\",\"p-70\",{\"children\":[\"The application will be available at \",[\"$\",\"a\",\"a-0\",{\"href\":\"http://localhost:8000\",\"children\":\"http://localhost:8000\"}],\".\"]}]\n1c1:[\"$\",\"h3\",\"h3-82\",{\"id\":\"4-stopping-the-services\",\"children\":\"4. Stopping the Services\"}]\n1c2:[\"$\",\"pre\",\"pre-133\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"classN"])</script><script>self.__next_f.push([1,"ame\":\"language-bash\",\"children\":\"docker-compose down\\n\"}]}]\n1c3:[\"$\",\"h2\",\"h2-73\",{\"id\":\"kubernetes-deployment\",\"children\":\"Kubernetes Deployment\"}]\n1c4:[\"$\",\"p\",\"p-71\",{\"children\":\"For production deployment on Kubernetes:\"}]\n1c5:[\"$\",\"h3\",\"h3-83\",{\"id\":\"1-prerequisites\",\"children\":\"1. Prerequisites\"}]\n1c6:[\"$\",\"ul\",\"ul-11\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Kubernetes cluster\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"kubectl configured\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Helm (optional, for Redis deployment)\"}],\"\\n\"]}]\n1c7:[\"$\",\"h3\",\"h3-84\",{\"id\":\"2-set-up-namespace-and-secrets\",\"children\":\"2. Set Up Namespace and Secrets\"}]\n1c8:[\"$\",\"pre\",\"pre-134\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Create namespace\\nkubectl create namespace mcp\\n\\n# Create secrets\\nkubectl create secret generic mcp-secrets \\\\\\n  --from-literal=openai-api-key=YOUR_OPENAI_API_KEY \\\\\\n  --from-literal=redis-password=YOUR_REDIS_PASSWORD \\\\\\n  -n mcp\\n\"}]}]\n1c9:[\"$\",\"h3\",\"h3-85\",{\"id\":\"3-deploy-redis-if-needed\",\"children\":\"3. Deploy Redis (if needed)\"}]\n1ca:[\"$\",\"pre\",\"pre-135\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Using Helm\\nhelm repo add bitnami https://charts.bitnami.com/bitnami\\nhelm install redis bitnami/redis \\\\\\n  --namespace mcp \\\\\\n  --set auth.password=YOUR_REDIS_PASSWORD \\\\\\n  --set master.persistence.size=8Gi\\n\"}]}]\n1cb:[\"$\",\"h3\",\"h3-86\",{\"id\":\"4-deploy-mcp-components\",\"children\":\"4. Deploy MCP Components\"}]\n1cc:[\"$\",\"pre\",\"pre-136\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Apply Kubernetes manifests\\nkubectl apply -f kubernetes/deployment.yaml -n mcp\\nkubectl apply -f kubernetes/service.yaml -n mcp\\nkubectl apply -f kubernetes/ingress.yaml -n mcp\\n\"}]}]\n1cd:[\"$\",\"h3\",\"h3-87\",{\"id\":\"5-set-up-autoscaling-optional\",\"children\":\"5. Set Up Autoscaling (Optional)\"}]\n1ce:[\"$\",\"pre\",\"pre-137\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"kubectl apply -f kubernetes/hpa.yaml -n mcp\\n\"}]}]\n1cf:[\"$\",\"h3\",\"h3-88\",{\"id\":\"6-check-deployment-status\",\"children\":\"6. Check Deployment Status\"}]\n1d0:[\"$\",\"pre\",\"pre-138\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"kubectl get pods -n mcp\\nkubectl get services -n mcp\\nkubectl get ingress -n mcp\\n\"}]}]\n1d1:[\"$\",\"h2\",\"h2-74\",{\"id\":\"aws-deployment\",\"children\":\"AWS Deployment\"}]\n1d2:[\"$\",\"p\",\"p-72\",{\"children\":\"For deployment on AWS Cloud:\"}]\n1d3:[\"$\",\"h3\",\"h3-89\",{\"id\":\"1-prerequisites-1\",\"children\":\"1. Prerequisites\"}]\n1d4:[\"$\",\"ul\",\"ul-12\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"AWS CLI configured\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Appropriate IAM permissions\"}],\"\\n\"]}]\n1d5:[\"$\",\"h3\",\"h3-90\",{\"id\":\"2-cloudformation-deployment\",\"children\":\"2. CloudFormation Deployment\"}]\n1d6:[\"$\",\"pre\",\"pre-139\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Deploy using CloudFormation template\\naws cloudformation create-stack \\\\\\n  --stack-name mcp-hybrid-system \\\\\\n  --template-body file://aws/cloudformation.yaml \\\\\\n  --capabilities CAPABILITY_IAM \\\\\\n  --parameters \\\\\\n    ParameterKey=Environment,ParameterValue=Production \\\\\\n    ParameterKey=OllamaInstanceType,ParameterValue=g4dn.xlarge\\n\\n# Check deployment status\\naws cloudformation describe-stacks --stack-name mcp-hybrid-system\\n\"}]}]\n1d7:[\"$\",\"h3\",\"h3-91\",{\"id\":\"3-deploy-api-image-to-ecr\",\"children\":\"3. Deploy API Image to ECR\"}]\n1d8:[\"$\",\"pre\",\"pre-140\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"langua"])</script><script>self.__next_f.push([1,"ge-bash\",\"children\":\"# Log in to ECR\\naws ecr get-login-password | docker login --username AWS --password-stdin YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com\\n\\n# Build and push image\\ndocker build -t YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-api:latest -f Dockerfile.prod .\\ndocker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/mcp-api:latest\\n\"}]}]\n1d9:[\"$\",\"h3\",\"h3-92\",{\"id\":\"4-update-ecs-service\",\"children\":\"4. Update ECS Service\"}]\n1da:[\"$\",\"pre\",\"pre-141\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Force new deployment to use the updated image\\naws ecs update-service --cluster mcp-hybrid-system-cluster --service mcp-hybrid-system-api --force-new-deployment\\n\"}]}]\n1db:[\"$\",\"hr\",\"hr-2\",{}]\n1dc:[\"$\",\"h1\",\"h1-11\",{\"id\":\"api-reference\",\"children\":\"API Reference\"}]\n1dd:[\"$\",\"h2\",\"h2-75\",{\"id\":\"authentication\",\"children\":\"Authentication\"}]\n1de:[\"$\",\"p\",\"p-73\",{\"children\":\"The MCP API uses API key authentication. Include your API key in all requests using either:\"}]\n1df:[\"$\",\"h3\",\"h3-93\",{\"id\":\"bearer-token-authentication\",\"children\":\"Bearer Token Authentication\"}]\n1e0:[\"$\",\"pre\",\"pre-142\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"Authorization: Bearer YOUR_API_KEY\\n\"}],\"position\":{\"start\":{\"line\":15045,\"column\":1,\"offset\":550278},\"end\":{\"line\":15047,\"column\":4,\"offset\":550320}}},\"children\":\"Authorization: Bearer YOUR_API_KEY\\n\"}]}]\n1e1:[\"$\",\"h3\",\"h3-94\",{\"id\":\"query-parameter\",\"children\":\"Query Parameter\"}]\n1e2:[\"$\",\"pre\",\"pre-143\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"?api_key=YOUR_API_KEY\\n\"}],\"position\":{\"start\":{\"line\":15051,\"column\":1,\"offset\":550343},\"end\":{\"line\":15053,\"column\":4,\"offset\":550372}}},\"children\":\"?api_key=YOUR_API_KEY\\n\"}]}]\n1e3:[\"$\",\"h2\",\"h2-76\",{\"id\":\"chat-endpoints\",\"children\":\"Chat Endpoints\"}]\n1e4:[\"$\",\"h3\",\"h3-95\",{\"id\":\"create-chat-completion\",\"children\":\"Create Chat Completion\"}]\n1e5:[\"$\",\"p\",\"p-74\",{\"children\":\"Generates a completion for a given conversation.\"}]\n1e6:[\"$\",\"p\",\"p-75\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"POST /api/v1/chat/completions\",\"position\":{\"start\":{\"line\":15061,\"column\":15,\"offset\":550485},\"end\":{\"line\":15061,\"column\":46,\"offset\":550516}}}],\"position\":{\"start\":{\"line\":15061,\"column\":15,\"offset\":550485},\"end\":{\"line\":15061,\"column\":46,\"offset\":550516}}},\"children\":\"POST /api/v1/chat/completions\"}]]}]\n1e7:[\"$\",\"p\",\"p-76\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Request Body:\"}]}]\n1e8:[\"$\",\"pre\",\"pre-144\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"messages\\\": [\\n    {\\\"role\\\": \\\"system\\\", \\\"content\\\": \\\"You are a helpful assistant.\\\"},\\n    {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Hello, who are you?\\\"}\\n  ],\\n  \\\"model\\\": \\\"auto\\\",\\n  \\\"temperature\\\": 0.7,\\n  \\\"max_tokens\\\": 1024,\\n  \\\"stream\\\": false,\\n  \\\"routing_preferences\\\": {\\n    \\\"force_provider\\\": null,\\n    \\\"privacy_level\\\": \\\"standard\\\",\\n    \\\"latency_preference\\\": \\\"balanced\\\"\\n  },\\n  \\\"tools\\\": []\\n}\\n\"}]}]\n1e9:[\"$\",\"p\",\"p-77\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Parameters:\"}]}]\n"])</script><script>self.__next_f.push([1,"1ea:[\"$\",\"table\",\"table-0\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Name\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Type\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Description\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"messages\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"array\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"Array of message objects representing the conversation history\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"model\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"string\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"The model to use, or \\\"auto\\\" for automatic selection\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"temperature\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"number\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"Controls randomness (0-1)\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"max_tokens\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"integer\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"Maximum tokens in response\"}]]}],[\"$\",\"tr\",\"tr-4\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"stream\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"boolean\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"Whether to stream the response\"}]]}],[\"$\",\"tr\",\"tr-5\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"routing_preferences\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"object\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"Preferences for provider selection\"}]]}],[\"$\",\"tr\",\"tr-6\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"tools\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"array\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"List of tools the assistant can use\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"1eb:[\"$\",\"p\",\"p-78\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n1ec:[\"$\",\"pre\",\"pre-145\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"id\\\": \\\"resp_abc123\\\",\\n  \\\"object\\\": \\\"chat.completion\\\",\\n  \\\"created\\\": 1677858242,\\n  \\\"provider\\\": \\\"openai\\\",\\n  \\\"model\\\": \\\"gpt-4o\\\",\\n  \\\"usage\\\": {\\n    \\\"prompt_tokens\\\": 56,\\n    \\\"completion_tokens\\\": 325,\\n    \\\"total_tokens\\\": 381\\n  },\\n  \\\"message\\\": {\\n    \\\"role\\\": \\\"assistant\\\",\\n    \\\"content\\\": \\\"Hello! I'm an AI assistant...\\\",\\n    \\\"tool_calls\\\": []\\n  },\\n  \\\"routing_metrics\\\": {\\n    \\\"complexity_score\\\": 0.78,\\n    \\\"privacy_impact\\\": \\\"low\\\",\\n    \\\"decision_factors\\\": [\\\"complexity\\\", \\\"tool_requirements\\\"]\\n  }\\n}\\n\"}]}]\n1ed:[\"$\",\"h3\",\"h3-96\",{\"id\":\"stream-chat-completion\",\"children\":\"Stream Chat Completion\"}]\n1ee:[\"$\",\"p\",\"p-79\",{\"children\":\"Stream a completion for a conversation.\"}]\n1ef:[\"$\",\"p\",\"p-80\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"POST /api/v1/chat/streaming\",\"position\":{\"start\":{\"line\":15127,\"column\":15,\"offset\":552043},\"end\":{\"line\":15127,\"column\":44,\"offset\":552072}}}],\"position\":{\"start\":{\"line\":15127,\"column\":15,\"offset\":552043},\"end\":{\"line\":15127,\"column\":44,\"offset\":552072}}},\"children\":\"POST /api/v1/chat/streaming\"}]]}]\n1f0:[\"$\",\"p\",\"p-81\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Request Body:\"}],\" Same as \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"/api/v1/chat/completions\",\"position\":{\"start\":{\"line\":15129,\"column\":27,\"offset\":552100},\"end\":{\"line\":15129,\"column\":53,\"offset\":552126}}}],\"position\":{\"start\":{\"line\":15129,\"column\":27,\"offset\":552100},\"end\":{\"line\":15129,\"column\":53,\"offset\":552126}}},\"children\":\"/api/v1/chat/completions\"}],\" but \",[\"$\",\"code\",\"code-1\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"stream\",\"position\":{\"start\":{\"line\":15129,\"column\":58,\"offset\":552131},\"end\":{\"line\":15129,\"column\":66,\"offset\":552139}}}],\"position\":{\"start\":{\"line\":15129,\"column\":58,\"offset\":552131},\"end\":{\"line\":15129,\"column\":66,\"offset\":552139}}},\"children\":\"stream\"}],\" must be \",[\"$\",\"code\",\"code-2\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"true\",\"position\":{\"start\":{\"line\":15129,\"column\":75,\"offset\":552148},\"end\":{\"line\":15129,\"column\":81,\"offset\":552154}}}],\"position\":{\"start\":{\"line\":15129,\"column\":75,\"offset\":552148},\"end\":{\"line\":15129,\"column\":81,\"offset\":552154}}},\"children\":\"true\"}],\".\"]}]\n1f1:[\"$\",\"p\",\"p-82\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}],\" Server-sent events (SSE) stream of partial completions.\"]}]\n1f2:[\"$\",\"h3\",\"h3-97\",{\"id\":\"hybrid-chat\",\"children\":\"Hybrid Chat\"}]\n1f3:[\"$\",\"p\",\"p-83\",{\"children\":\"Intelligent routing between OpenAI and Ollama based on query characteristics.\"}]\n1f4:[\"$\",\"p\",\"p-84\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"POST /api/v1/chat/hybrid\",\"position\":{\"start\":{\"line\":15137,\"column\":15,\"offset\":552338},\"end\":{\"line\":15137,\"column\":41,\"offset\":552364}}}],\"position\":{\"start\":{\"line\":15137,\"column\":15,\"offset\":552338},\"end\":{\"line\":15137,\"column\":41,\"offset\":552364}}},\"children\":\"POST /api/v1/chat/hybrid\"}]]}]\n1f5:[\"$\",\"p\",\"p-85\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Request Body:\"}]}]\n1f6:[\"$\",\"pre\",\"pre-146\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"messages\\\": [\\n    {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Explain quantum comput"])</script><script>self.__next_f.push([1,"ing\\\"}\\n  ],\\n  \\\"mode\\\": \\\"auto\\\",\\n  \\\"options\\\": {\\n    \\\"prioritize_privacy\\\": false,\\n    \\\"prioritize_speed\\\": false\\n  }\\n}\\n\"}]}]\n1f7:[\"$\",\"p\",\"p-86\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}],\" Same format as \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"/api/v1/chat/completions\",\"position\":{\"start\":{\"line\":15154,\"column\":30,\"offset\":552613},\"end\":{\"line\":15154,\"column\":56,\"offset\":552639}}}],\"position\":{\"start\":{\"line\":15154,\"column\":30,\"offset\":552613},\"end\":{\"line\":15154,\"column\":56,\"offset\":552639}}},\"children\":\"/api/v1/chat/completions\"}],\".\"]}]\n1f8:[\"$\",\"h2\",\"h2-77\",{\"id\":\"agent-endpoints\",\"children\":\"Agent Endpoints\"}]\n1f9:[\"$\",\"h3\",\"h3-98\",{\"id\":\"run-agent\",\"children\":\"Run Agent\"}]\n1fa:[\"$\",\"p\",\"p-87\",{\"children\":\"Execute an agent with specific configuration.\"}]\n1fb:[\"$\",\"p\",\"p-88\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"POST /api/v1/agents/run\",\"position\":{\"start\":{\"line\":15162,\"column\":15,\"offset\":552738},\"end\":{\"line\":15162,\"column\":40,\"offset\":552763}}}],\"position\":{\"start\":{\"line\":15162,\"column\":15,\"offset\":552738},\"end\":{\"line\":15162,\"column\":40,\"offset\":552763}}},\"children\":\"POST /api/v1/agents/run\"}]]}]\n1fc:[\"$\",\"p\",\"p-89\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Request Body:\"}]}]\n1fd:[\"$\",\"pre\",\"pre-147\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"agent_config\\\": {\\n    \\\"instructions\\\": \\\"You are a research assistant...\\\",\\n    \\\"model\\\": \\\"gpt-4o\\\",\\n    \\\"tools\\\": [\\n      {\\n        \\\"type\\\": \\\"function\\\",\\n        \\\"function\\\": {\\n          \\\"name\\\": \\\"search_knowledge_base\\\",\\n          \\\"description\\\": \\\"Search for information\\\",\\n          \\\"parameters\\\": {\\n            \\\"type\\\": \\\"object\\\",\\n            \\\"properties\\\": {\\n              \\\"query\\\": {\\n                \\\"type\\\": \\\"string\\\"\\n              }\\n            },\\n            \\\"required\\\": [\\\"query\\\"]\\n          }\\n        }\\n      }\\n    ]\\n  },\\n  \\\"messages\\\": [\\n    {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Find information about renewable energy\\\"}\\n  ],\\n  \\\"metadata\\\": {\\n    \\\"session_id\\\": \\\"user_session_123\\\"\\n  }\\n}\\n\"}]}]\n1fe:[\"$\",\"p\",\"p-90\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n1ff:[\"$\",\"pre\",\"pre-148\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"run_id\\\": \\\"run_abc123\\\",\\n  \\\"status\\\": \\\"in_progress\\\",\\n  \\\"created_at\\\": 1677858242,\\n  \\\"estimated_completion_time\\\": 1677858260,\\n  \\\"polling_url\\\": \\\"/api/v1/agents/status/run_abc123\\\"\\n}\\n\"}]}]\n200:[\"$\",\"h3\",\"h3-99\",{\"id\":\"get-agent-status\",\"children\":\"Get Agent Status\"}]\n201:[\"$\",\"p\",\"p-91\",{\"children\":\"Check the status of a running agent.\"}]\n202:[\"$\",\"p\",\"p-92\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/agents/status/{run_id}\",\"position\":{\"start\":{\"line\":15215,\"column\":15,\"offset\":553749},\"end\":{\"line\":15215,\"column\":51,\"offset\":553785}}}],\"position\":{\"start\":{\"line\":15215,\"column\":15,\"offset\":553749},\"end\":{\"line\":15215,\"column\":51,\"offset\":553785}}},\"children\":\"GET /api/v1/agents/status/{run_id}\"}]]}]\n203:[\"$\",\"p\",\"p-93\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n204:[\"$\",\"pre\",\"pre-149\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"run_id\\\": \\\"run_abc123\\\",\\n  \\\"status\\\": \\\"completed\\\",\\n  \\\"result\\\": {\\n    \\\"output\\\": \\\"Renewable energy comes from sources that are...\\\",\\n    \\\"tool_calls\\\": []\\n  },\\n  \\\"created_"])</script><script>self.__next_f.push([1,"at\\\": 1677858242,\\n  \\\"completed_at\\\": 1677858260\\n}\\n\"}]}]\n205:[\"$\",\"h3\",\"h3-100\",{\"id\":\"list-available-agents\",\"children\":\"List Available Agents\"}]\n206:[\"$\",\"p\",\"p-94\",{\"children\":\"List all available agent configurations.\"}]\n207:[\"$\",\"p\",\"p-95\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/agents\",\"position\":{\"start\":{\"line\":15236,\"column\":15,\"offset\":554115},\"end\":{\"line\":15236,\"column\":35,\"offset\":554135}}}],\"position\":{\"start\":{\"line\":15236,\"column\":15,\"offset\":554115},\"end\":{\"line\":15236,\"column\":35,\"offset\":554135}}},\"children\":\"GET /api/v1/agents\"}]]}]\n208:[\"$\",\"p\",\"p-96\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n209:[\"$\",\"pre\",\"pre-150\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"agents\\\": [\\n    {\\n      \\\"id\\\": \\\"research\\\",\\n      \\\"name\\\": \\\"Research Assistant\\\",\\n      \\\"description\\\": \\\"Specialized in finding and synthesizing information\\\"\\n    },\\n    {\\n      \\\"id\\\": \\\"coding\\\",\\n      \\\"name\\\": \\\"Code Assistant\\\",\\n      \\\"description\\\": \\\"Helps with programming tasks\\\"\\n    }\\n  ]\\n}\\n\"}]}]\n20a:[\"$\",\"h2\",\"h2-78\",{\"id\":\"model-management-endpoints\",\"children\":\"Model Management Endpoints\"}]\n20b:[\"$\",\"h3\",\"h3-101\",{\"id\":\"list-models\",\"children\":\"List Models\"}]\n20c:[\"$\",\"p\",\"p-97\",{\"children\":\"List all available models.\"}]\n20d:[\"$\",\"p\",\"p-98\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/models\",\"position\":{\"start\":{\"line\":15263,\"column\":15,\"offset\":554543},\"end\":{\"line\":15263,\"column\":35,\"offset\":554563}}}],\"position\":{\"start\":{\"line\":15263,\"column\":15,\"offset\":554543},\"end\":{\"line\":15263,\"column\":35,\"offset\":554563}}},\"children\":\"GET /api/v1/models\"}]]}]\n20e:[\"$\",\"p\",\"p-99\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n20f:[\"$\",\"pre\",\"pre-151\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"openai_models\\\": [\\n    {\\n      \\\"id\\\": \\\"gpt-4o\\\",\\n      \\\"name\\\": \\\"GPT-4o\\\",\\n      \\\"capabilities\\\": [\\\"general\\\", \\\"code\\\", \\\"reasoning\\\"],\\n      \\\"context_window\\\": 128000\\n    },\\n    {\\n      \\\"id\\\": \\\"gpt-3.5-turbo\\\",\\n      \\\"name\\\": \\\"GPT-3.5 Turbo\\\",\\n      \\\"capabilities\\\": [\\\"general\\\"],\\n      \\\"context_window\\\": 16000\\n    }\\n  ],\\n  \\\"ollama_models\\\": [\\n    {\\n      \\\"id\\\": \\\"llama2\\\",\\n      \\\"name\\\": \\\"Llama 2\\\",\\n      \\\"capabilities\\\": [\\\"general\\\"],\\n      \\\"context_window\\\": 4096\\n    },\\n    {\\n      \\\"id\\\": \\\"mistral\\\",\\n      \\\"name\\\": \\\"Mistral\\\",\\n      \\\"capabilities\\\": [\\\"general\\\", \\\"reasoning\\\"],\\n      \\\"context_window\\\": 8192\\n    }\\n  ]\\n}\\n\"}]}]\n210:[\"$\",\"h3\",\"h3-102\",{\"id\":\"get-model-details\",\"children\":\"Get Model Details\"}]\n211:[\"$\",\"p\",\"p-100\",{\"children\":\"Get detailed information about a specific model.\"}]\n212:[\"$\",\"p\",\"p-101\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/models/{model_id}\",\"position\":{\"start\":{\"line\":15304,\"column\":15,\"offset\":555279},\"end\":{\"line\":15304,\"column\":46,\"offset\":555310}}}],\"position\":{\"start\":{\"line\":15304,\"column\":15,\"offset\":555279},\"end\":{\"line\":15304,\"column\":46,\"offset\":555310}}},\"children\":\"GET /api/v1/models/{model_id}\"}]]}]\n213:[\"$\",\"p\",\"p-102\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n214:[\"$\",\"pre\",\"pre-152\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"id\\\": \\\"mistral\\\",\\n  \\\"name\\\": \\\"Mistral\\\",\\n  \\\"provider\\\": \\\"ol"])</script><script>self.__next_f.push([1,"lama\\\",\\n  \\\"capabilities\\\": [\\\"general\\\", \\\"reasoning\\\"],\\n  \\\"context_window\\\": 8192,\\n  \\\"recommended_usage\\\": \\\"General purpose tasks with reasoning requirements\\\",\\n  \\\"performance_characteristics\\\": {\\n    \\\"average_response_time\\\": 2.4,\\n    \\\"tokens_per_second\\\": 45\\n  }\\n}\\n\"}]}]\n215:[\"$\",\"h3\",\"h3-103\",{\"id\":\"pull-ollama-model\",\"children\":\"Pull Ollama Model\"}]\n216:[\"$\",\"p\",\"p-103\",{\"children\":\"Pull a new model for Ollama.\"}]\n217:[\"$\",\"p\",\"p-104\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"POST /api/v1/models/ollama/pull\",\"position\":{\"start\":{\"line\":15327,\"column\":15,\"offset\":555722},\"end\":{\"line\":15327,\"column\":48,\"offset\":555755}}}],\"position\":{\"start\":{\"line\":15327,\"column\":15,\"offset\":555722},\"end\":{\"line\":15327,\"column\":48,\"offset\":555755}}},\"children\":\"POST /api/v1/models/ollama/pull\"}]]}]\n218:[\"$\",\"p\",\"p-105\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Request Body:\"}]}]\n219:[\"$\",\"pre\",\"pre-153\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"model\\\": \\\"wizard-math\\\"\\n}\\n\"}]}]\n21a:[\"$\",\"p\",\"p-106\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n21b:[\"$\",\"pre\",\"pre-154\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"status\\\": \\\"pulling\\\",\\n  \\\"model\\\": \\\"wizard-math\\\",\\n  \\\"estimated_time\\\": 120\\n}\\n\"}]}]\n21c:[\"$\",\"h2\",\"h2-79\",{\"id\":\"system-endpoints\",\"children\":\"System Endpoints\"}]\n21d:[\"$\",\"h3\",\"h3-104\",{\"id\":\"health-check\",\"children\":\"Health Check\"}]\n21e:[\"$\",\"p\",\"p-107\",{\"children\":\"Check system health.\"}]\n21f:[\"$\",\"p\",\"p-108\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/health\",\"position\":{\"start\":{\"line\":15353,\"column\":15,\"offset\":555998},\"end\":{\"line\":15353,\"column\":35,\"offset\":556018}}}],\"position\":{\"start\":{\"line\":15353,\"column\":15,\"offset\":555998},\"end\":{\"line\":15353,\"column\":35,\"offset\":556018}}},\"children\":\"GET /api/v1/health\"}]]}]\n220:[\"$\",\"p\",\"p-109\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n221:[\"$\",\"pre\",\"pre-155\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"status\\\": \\\"ok\\\",\\n  \\\"version\\\": \\\"1.0.0\\\",\\n  \\\"providers\\\": {\\n    \\\"openai\\\": \\\"connected\\\",\\n    \\\"ollama\\\": \\\"connected\\\"\\n  },\\n  \\\"uptime\\\": 3600\\n}\\n\"}]}]\n222:[\"$\",\"h3\",\"h3-105\",{\"id\":\"system-configuration\",\"children\":\"System Configuration\"}]\n223:[\"$\",\"p\",\"p-110\",{\"children\":\"Get current system configuration.\"}]\n224:[\"$\",\"p\",\"p-111\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/config\",\"position\":{\"start\":{\"line\":15373,\"column\":15,\"offset\":556259},\"end\":{\"line\":15373,\"column\":35,\"offset\":556279}}}],\"position\":{\"start\":{\"line\":15373,\"column\":15,\"offset\":556259},\"end\":{\"line\":15373,\"column\":35,\"offset\":556279}}},\"children\":\"GET /api/v1/config\"}]]}]\n225:[\"$\",\"p\",\"p-112\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n226:[\"$\",\"pre\",\"pre-156\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"routing\\\": {\\n    \\\"complexity_threshold\\\": 0.65,\\n    \\\"privacy_sensitive_patterns\\\": [\\\"password\\\", \\\"secret\\\", \\\"key\\\"],\\n    \\\"default_provider\\\": \\\"auto\\\"\\n  },\\n  \\\"caching\\\": {\\n    \\\"enabled\\\": true,\\n    \\\"ttl\\\": 3600\\n  },\\n  \\\"optimization\\\": {\\n    \\\"token_optimization\\\": true,\\n    \\\"parallel_processing\\\": true\\n  },\\n "])</script><script>self.__next_f.push([1," \\\"monitoring\\\": {\\n    \\\"metrics_collection\\\": true,\\n    \\\"log_level\\\": \\\"info\\\"\\n  }\\n}\\n\"}]}]\n227:[\"$\",\"h3\",\"h3-106\",{\"id\":\"update-configuration\",\"children\":\"Update Configuration\"}]\n228:[\"$\",\"p\",\"p-113\",{\"children\":\"Update system configuration.\"}]\n229:[\"$\",\"p\",\"p-114\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"POST /api/v1/config\",\"position\":{\"start\":{\"line\":15403,\"column\":15,\"offset\":556757},\"end\":{\"line\":15403,\"column\":36,\"offset\":556778}}}],\"position\":{\"start\":{\"line\":15403,\"column\":15,\"offset\":556757},\"end\":{\"line\":15403,\"column\":36,\"offset\":556778}}},\"children\":\"POST /api/v1/config\"}]]}]\n22a:[\"$\",\"p\",\"p-115\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Request Body:\"}]}]\n22b:[\"$\",\"pre\",\"pre-157\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"routing\\\": {\\n    \\\"complexity_threshold\\\": 0.7\\n  },\\n  \\\"caching\\\": {\\n    \\\"ttl\\\": 7200\\n  }\\n}\\n\"}]}]\n22c:[\"$\",\"p\",\"p-116\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n22d:[\"$\",\"pre\",\"pre-158\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"status\\\": \\\"updated\\\",\\n  \\\"updated_fields\\\": [\\\"routing.complexity_threshold\\\", \\\"caching.ttl\\\"]\\n}\\n\"}]}]\n22e:[\"$\",\"h3\",\"h3-107\",{\"id\":\"system-metrics\",\"children\":\"System Metrics\"}]\n22f:[\"$\",\"p\",\"p-117\",{\"children\":\"Get system performance metrics.\"}]\n230:[\"$\",\"p\",\"p-118\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Endpoint:\"}],\" \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"GET /api/v1/metrics\",\"position\":{\"start\":{\"line\":15431,\"column\":15,\"offset\":557093},\"end\":{\"line\":15431,\"column\":36,\"offset\":557114}}}],\"position\":{\"start\":{\"line\":15431,\"column\":15,\"offset\":557093},\"end\":{\"line\":15431,\"column\":36,\"offset\":557114}}},\"children\":\"GET /api/v1/metrics\"}]]}]\n231:[\"$\",\"p\",\"p-119\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Response:\"}]}]\n232:[\"$\",\"pre\",\"pre-159\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"requests\\\": {\\n    \\\"total\\\": 15420,\\n    \\\"last_minute\\\": 42,\\n    \\\"last_hour\\\": 1254\\n  },\\n  \\\"routing\\\": {\\n    \\\"openai_requests\\\": 6210,\\n    \\\"ollama_requests\\\": 9210,\\n    \\\"auto_routing_accuracy\\\": 0.94\\n  },\\n  \\\"performance\\\": {\\n    \\\"average_response_time\\\": 2.3,\\n    \\\"p95_response_time\\\": 6.1,\\n    \\\"cache_hit_rate\\\": 0.37\\n  },\\n  \\\"cost\\\": {\\n    \\\"total_openai_cost\\\": 135.42,\\n    \\\"estimated_savings\\\": 98.67,\\n    \\\"cost_per_request\\\": 0.0088\\n  }\\n}\\n\"}]}]\n233:[\"$\",\"hr\",\"hr-3\",{}]\n234:[\"$\",\"h1\",\"h1-12\",{\"id\":\"configuration\",\"children\":\"Configuration\"}]\n235:[\"$\",\"h2\",\"h2-80\",{\"id\":\"environment-variables\",\"children\":\"Environment Variables\"}]\n236:[\"$\",\"p\",\"p-120\",{\"children\":\"The MCP system can be configured using the following environment variables:\"}]\n237:[\"$\",\"h3\",\"h3-108\",{\"id\":\"core-configuration\",\"children\":\"Core Configuration\"}]\n"])</script><script>self.__next_f.push([1,"238:[\"$\",\"table\",\"table-1\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Variable\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Description\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Default Value\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OPENAI_API_KEY\",\"position\":{\"start\":{\"line\":15472,\"column\":3,\"offset\":557810},\"end\":{\"line\":15472,\"column\":19,\"offset\":557826}}}],\"position\":{\"start\":{\"line\":15472,\"column\":3,\"offset\":557810},\"end\":{\"line\":15472,\"column\":19,\"offset\":557826}}},\"children\":\"OPENAI_API_KEY\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"OpenAI API Key\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"(Required)\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OPENAI_ORG_ID\",\"position\":{\"start\":{\"line\":15473,\"column\":3,\"offset\":557861},\"end\":{\"line\":15473,\"column\":18,\"offset\":557876}}}],\"position\":{\"start\":{\"line\":15473,\"column\":3,\"offset\":557861},\"end\":{\"line\":15473,\"column\":18,\"offset\":557876}}},\"children\":\"OPENAI_ORG_ID\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"OpenAI Organization ID\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"(Optional)\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OPENAI_MODEL\",\"position\":{\"start\":{\"line\":15474,\"column\":3,\"offset\":557919},\"end\":{\"line\":15474,\"column\":17,\"offset\":557933}}}],\"position\":{\"start\":{\"line\":15474,\"column\":3,\"offset\":557919},\"end\":{\"line\":15474,\"column\":17,\"offset\":557933}}},\"children\":\"OPENAI_MODEL\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Default OpenAI model\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"gpt-4o\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OLLAMA_HOST\",\"position\":{\"start\":{\"line\":15475,\"column\":3,\"offset\":557970},\"end\":{\"line\":15475,\"column\":16,\"offset\":557983}}}],\"position\":{\"start\":{\"line\":15475,\"column\":3,\"offset\":557970},\"end\":{\"line\":15475,\"column\":16,\"offset\":557983}}},\"children\":\"OLLAMA_HOST\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Ollama host URL\"}],[\"$\",\"td\",\"td-2\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"http://localhost:11434\",\"children\":\"http://localhost:11434\"}]}]]}],[\"$\",\"tr\",\"tr-4\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OLLAMA_MODEL\",\"position\":{\"start\":{\"line\":15476,\"column\":3,\"offset\":558031},\"end\":{\"line\":15476,\"column\":17,\"offset\":558045}}}],\"position\":{\"start\":{\"line\":15476,\"column\":3,\"offset\":558031},\"end\":{\"line\":15476,\"column\":17,\"offset\":558045}}},\"children\":\"OLLAMA_MODEL\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Default Ollama model\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"llama2\"}]]}],[\"$\",\"tr\",\"tr-5\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"APP_ENV\",\"position\":{\"start\":{\"line\":15477,\"column\":3,\"offset\":558082},\"end\":{\"line\":15477,\"column\":12,\"offset\":558091}}}],\"position\":{\"start\":{\"line\":15477,\"column\":3,\"offset\":558082},\"end\":{\"line\":15477,\"column\":12,\"offset\":558091}}},\"children\":\"APP_ENV\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Environment (development, staging, production)\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"development\"}]]}],[\"$\",\"tr\",\"tr-6\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"LOG_LEVEL\",\"position\":{\"start\":{\"line\":15478,\"column\":3,\"offset\":558159},\"end\":{\"line\":15478,\"column\":14,\"offset\":558170}}}],\"position\":{\"start\":{\"line\":15478,\"column\":3,\"offset\":558159},\"end\":{\"line\":15478,\"column\":14,\"offset\":558170}}},\"children\":\"LOG_LEVEL\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Logging level\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"INFO\"}]]}],[\"$\",\"tr\",\"tr-7\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"PORT\",\"position\":{\"start\":{\"line\":15479,\"column\":3,\"offset\":558198},\"end\":{\"line\":15479,\"column\":9,\"offset\":558204}}}],\"position\":{\"start\":{\"line\":15479,\"column\":3,\"offset\":558198},\"end\":{\"line\":15479,\"column\":9,\"offset\":558204}}},\"children\":\"PORT\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"API server port\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"8000\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"239:[\"$\",\"h3\",\"h3-109\",{\"id\":\"redis-configuration\",\"children\":\"Redis Configuration\"}]\n"])</script><script>self.__next_f.push([1,"23a:[\"$\",\"table\",\"table-2\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Variable\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Description\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Default Value\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"REDIS_URL\",\"position\":{\"start\":{\"line\":15485,\"column\":3,\"offset\":558346},\"end\":{\"line\":15485,\"column\":14,\"offset\":558357}}}],\"position\":{\"start\":{\"line\":15485,\"column\":3,\"offset\":558346},\"end\":{\"line\":15485,\"column\":14,\"offset\":558357}}},\"children\":\"REDIS_URL\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Redis connection URL\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"redis://localhost:6379/0\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"REDIS_PASSWORD\",\"position\":{\"start\":{\"line\":15486,\"column\":3,\"offset\":558412},\"end\":{\"line\":15486,\"column\":19,\"offset\":558428}}}],\"position\":{\"start\":{\"line\":15486,\"column\":3,\"offset\":558412},\"end\":{\"line\":15486,\"column\":19,\"offset\":558428}}},\"children\":\"REDIS_PASSWORD\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Redis password\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"(Optional)\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_CACHING\",\"position\":{\"start\":{\"line\":15487,\"column\":3,\"offset\":558463},\"end\":{\"line\":15487,\"column\":19,\"offset\":558479}}}],\"position\":{\"start\":{\"line\":15487,\"column\":3,\"offset\":558463},\"end\":{\"line\":15487,\"column\":19,\"offset\":558479}}},\"children\":\"ENABLE_CACHING\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Enable response caching\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"true\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"CACHE_TTL\",\"position\":{\"start\":{\"line\":15488,\"column\":3,\"offset\":558517},\"end\":{\"line\":15488,\"column\":14,\"offset\":558528}}}],\"position\":{\"start\":{\"line\":15488,\"column\":3,\"offset\":558517},\"end\":{\"line\":15488,\"column\":14,\"offset\":558528}}},\"children\":\"CACHE_TTL\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Cache TTL in seconds\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"3600\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"23b:[\"$\",\"h3\",\"h3-110\",{\"id\":\"routing-configuration\",\"children\":\"Routing Configuration\"}]\n"])</script><script>self.__next_f.push([1,"23c:[\"$\",\"table\",\"table-3\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Variable\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Description\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Default Value\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"COMPLEXITY_THRESHOLD\",\"position\":{\"start\":{\"line\":15494,\"column\":3,\"offset\":558677},\"end\":{\"line\":15494,\"column\":25,\"offset\":558699}}}],\"position\":{\"start\":{\"line\":15494,\"column\":3,\"offset\":558677},\"end\":{\"line\":15494,\"column\":25,\"offset\":558699}}},\"children\":\"COMPLEXITY_THRESHOLD\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Threshold for routing to OpenAI\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"0.65\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"PRIVACY_SENSITIVE_TOKENS\",\"position\":{\"start\":{\"line\":15495,\"column\":3,\"offset\":558745},\"end\":{\"line\":15495,\"column\":29,\"offset\":558771}}}],\"position\":{\"start\":{\"line\":15495,\"column\":3,\"offset\":558745},\"end\":{\"line\":15495,\"column\":29,\"offset\":558771}}},\"children\":\"PRIVACY_SENSITIVE_TOKENS\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Comma-separated list of privacy-sensitive tokens\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"password,secret,key\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"DEFAULT_PROVIDER\",\"position\":{\"start\":{\"line\":15496,\"column\":3,\"offset\":558849},\"end\":{\"line\":15496,\"column\":21,\"offset\":558867}}}],\"position\":{\"start\":{\"line\":15496,\"column\":3,\"offset\":558849},\"end\":{\"line\":15496,\"column\":21,\"offset\":558867}}},\"children\":\"DEFAULT_PROVIDER\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Default provider if not specified\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"auto\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"FORCE_OLLAMA\",\"position\":{\"start\":{\"line\":15497,\"column\":3,\"offset\":558915},\"end\":{\"line\":15497,\"column\":17,\"offset\":558929}}}],\"position\":{\"start\":{\"line\":15497,\"column\":3,\"offset\":558915},\"end\":{\"line\":15497,\"column\":17,\"offset\":558929}}},\"children\":\"FORCE_OLLAMA\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Force using Ollama for all requests\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"false\"}]]}],[\"$\",\"tr\",\"tr-4\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"FORCE_OPENAI\",\"position\":{\"start\":{\"line\":15498,\"column\":3,\"offset\":558980},\"end\":{\"line\":15498,\"column\":17,\"offset\":558994}}}],\"position\":{\"start\":{\"line\":15498,\"column\":3,\"offset\":558980},\"end\":{\"line\":15498,\"column\":17,\"offset\":558994}}},\"children\":\"FORCE_OPENAI\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Force using OpenAI for all requests\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"false\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"23d:[\"$\",\"h3\",\"h3-111\",{\"id\":\"performance-configuration\",\"children\":\"Performance Configuration\"}]\n"])</script><script>self.__next_f.push([1,"23e:[\"$\",\"table\",\"table-4\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Variable\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Description\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Default Value\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_PARALLEL_PROCESSING\",\"position\":{\"start\":{\"line\":15504,\"column\":3,\"offset\":559163},\"end\":{\"line\":15504,\"column\":31,\"offset\":559191}}}],\"position\":{\"start\":{\"line\":15504,\"column\":3,\"offset\":559163},\"end\":{\"line\":15504,\"column\":31,\"offset\":559191}}},\"children\":\"ENABLE_PARALLEL_PROCESSING\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Enable parallel processing for complex queries\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"true\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"MAX_PARALLEL_REQUESTS\",\"position\":{\"start\":{\"line\":15505,\"column\":3,\"offset\":559252},\"end\":{\"line\":15505,\"column\":26,\"offset\":559275}}}],\"position\":{\"start\":{\"line\":15505,\"column\":3,\"offset\":559252},\"end\":{\"line\":15505,\"column\":26,\"offset\":559275}}},\"children\":\"MAX_PARALLEL_REQUESTS\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Maximum number of parallel requests\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"4\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_BATCHING\",\"position\":{\"start\":{\"line\":15506,\"column\":3,\"offset\":559322},\"end\":{\"line\":15506,\"column\":20,\"offset\":559339}}}],\"position\":{\"start\":{\"line\":15506,\"column\":3,\"offset\":559322},\"end\":{\"line\":15506,\"column\":20,\"offset\":559339}}},\"children\":\"ENABLE_BATCHING\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Enable request batching\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"true\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"MAX_BATCH_SIZE\",\"position\":{\"start\":{\"line\":15507,\"column\":3,\"offset\":559377},\"end\":{\"line\":15507,\"column\":19,\"offset\":559393}}}],\"position\":{\"start\":{\"line\":15507,\"column\":3,\"offset\":559377},\"end\":{\"line\":15507,\"column\":19,\"offset\":559393}}},\"children\":\"MAX_BATCH_SIZE\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Maximum batch size\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"5\"}]]}],[\"$\",\"tr\",\"tr-4\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"REQUEST_TIMEOUT\",\"position\":{\"start\":{\"line\":15508,\"column\":3,\"offset\":559423},\"end\":{\"line\":15508,\"column\":20,\"offset\":559440}}}],\"position\":{\"start\":{\"line\":15508,\"column\":3,\"offset\":559423},\"end\":{\"line\":15508,\"column\":20,\"offset\":559440}}},\"children\":\"REQUEST_TIMEOUT\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Request timeout in seconds\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"120\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"23f:[\"$\",\"h3\",\"h3-112\",{\"id\":\"cost-optimization\",\"children\":\"Cost Optimization\"}]\n"])</script><script>self.__next_f.push([1,"240:[\"$\",\"table\",\"table-5\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Variable\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Description\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Default Value\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"MONTHLY_BUDGET\",\"position\":{\"start\":{\"line\":15514,\"column\":3,\"offset\":559590},\"end\":{\"line\":15514,\"column\":19,\"offset\":559606}}}],\"position\":{\"start\":{\"line\":15514,\"column\":3,\"offset\":559590},\"end\":{\"line\":15514,\"column\":19,\"offset\":559606}}},\"children\":\"MONTHLY_BUDGET\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Monthly budget cap for OpenAI usage (USD)\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"0 (no limit)\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_TOKEN_OPTIMIZATION\",\"position\":{\"start\":{\"line\":15515,\"column\":3,\"offset\":559670},\"end\":{\"line\":15515,\"column\":30,\"offset\":559697}}}],\"position\":{\"start\":{\"line\":15515,\"column\":3,\"offset\":559670},\"end\":{\"line\":15515,\"column\":30,\"offset\":559697}}},\"children\":\"ENABLE_TOKEN_OPTIMIZATION\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Enable token usage optimization\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"true\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"TOKEN_BUDGET\",\"position\":{\"start\":{\"line\":15516,\"column\":3,\"offset\":559743},\"end\":{\"line\":15516,\"column\":17,\"offset\":559757}}}],\"position\":{\"start\":{\"line\":15516,\"column\":3,\"offset\":559743},\"end\":{\"line\":15516,\"column\":17,\"offset\":559757}}},\"children\":\"TOKEN_BUDGET\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Token budget per request\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"0 (no limit)\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"DEV_MODE_TOKEN_LIMIT\",\"position\":{\"start\":{\"line\":15517,\"column\":3,\"offset\":559804},\"end\":{\"line\":15517,\"column\":25,\"offset\":559826}}}],\"position\":{\"start\":{\"line\":15517,\"column\":3,\"offset\":559804},\"end\":{\"line\":15517,\"column\":25,\"offset\":559826}}},\"children\":\"DEV_MODE_TOKEN_LIMIT\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Token limit in development mode\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"1000\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"241:[\"$\",\"h3\",\"h3-113\",{\"id\":\"monitoring\",\"children\":\"Monitoring\"}]\n"])</script><script>self.__next_f.push([1,"242:[\"$\",\"table\",\"table-6\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Variable\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"Description\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Default Value\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_METRICS\",\"position\":{\"start\":{\"line\":15523,\"column\":3,\"offset\":559975},\"end\":{\"line\":15523,\"column\":19,\"offset\":559991}}}],\"position\":{\"start\":{\"line\":15523,\"column\":3,\"offset\":559975},\"end\":{\"line\":15523,\"column\":19,\"offset\":559991}}},\"children\":\"ENABLE_METRICS\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Enable metrics collection\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"true\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"METRICS_PORT\",\"position\":{\"start\":{\"line\":15524,\"column\":3,\"offset\":560031},\"end\":{\"line\":15524,\"column\":17,\"offset\":560045}}}],\"position\":{\"start\":{\"line\":15524,\"column\":3,\"offset\":560031},\"end\":{\"line\":15524,\"column\":17,\"offset\":560045}}},\"children\":\"METRICS_PORT\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Prometheus metrics port\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"9090\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_TRACING\",\"position\":{\"start\":{\"line\":15525,\"column\":3,\"offset\":560083},\"end\":{\"line\":15525,\"column\":19,\"offset\":560099}}}],\"position\":{\"start\":{\"line\":15525,\"column\":3,\"offset\":560083},\"end\":{\"line\":15525,\"column\":19,\"offset\":560099}}},\"children\":\"ENABLE_TRACING\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Enable distributed tracing\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"false\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"SENTRY_DSN\",\"position\":{\"start\":{\"line\":15526,\"column\":3,\"offset\":560141},\"end\":{\"line\":15526,\"column\":15,\"offset\":560153}}}],\"position\":{\"start\":{\"line\":15526,\"column\":3,\"offset\":560141},\"end\":{\"line\":15526,\"column\":15,\"offset\":560153}}},\"children\":\"SENTRY_DSN\"}]}],[\"$\",\"td\",\"td-1\",{\"children\":\"Sentry DSN for error tracking\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"(Optional)\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"243:[\"$\",\"h2\",\"h2-81\",{\"id\":\"advanced-configuration\",\"children\":\"Advanced Configuration\"}]\n244:[\"$\",\"h3\",\"h3-114\",{\"id\":\"configuration-file\",\"children\":\"Configuration File\"}]\n245:[\"$\",\"p\",\"p-121\",{\"children\":[\"For more advanced configuration, create a YAML configuration file at \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"config/config.yaml\",\"position\":{\"start\":{\"line\":15532,\"column\":70,\"offset\":560322},\"end\":{\"line\":15532,\"column\":90,\"offset\":560342}}}],\"position\":{\"start\":{\"line\":15532,\"column\":70,\"offset\":560322},\"end\":{\"line\":15532,\"column\":90,\"offset\":560342}}},\"children\":\"config/config.yaml\"}],\":\"]}]\n32f:T49f,routing:\n  # Complexity assessment weights\n  complexity_weights:\n    length: 0.3\n    specialized_terms: 0.4\n    sentence_structure: 0.3\n  \n  # Ollama model routing\n  ollama_routing:\n    code_generation: \"codellama\"\n    mathematical: \"wizard-math\"\n    creative: \"dolphin-mistral\"\n    general: \"mistral\"\n    \n  # OpenAI model routing\n  openai_routing:\n    complex_reasoning: \"gpt-4o\"\n    general: \"gpt-3.5-turbo\"\n\ncaching:\n  # Semantic caching configuration\n  semantic:\n    enabled: true\n    similarity_threshold: 0.92\n    max_cached_items: 1000\n    \n  # Exact match caching\n  exact:\n    enabled: true\n    max_cached_items: 500\n\noptimization:\n  # Chain of thought settings\n  chain_of_thought:\n    enabled: true\n    task_types: [\"reasoning\", \"math\", \"decision\"]\n    \n  # Response verification\n  verification:\n    enabled: true\n    high_risk_categories: [\"financial\", \"legal\", \"medical\"]\n\nmonitoring:\n  # Logging configuration\n  logging:\n    format: \"json\"\n    include_request_body: false\n    mask_sensitive_data: true\n    \n  # Alert thresholds\n  alerts:\n    high_latency_threshold: 5.0  # seconds\n    error_rate_threshold: 0.05   # 5%\n    budget_warning_threshold: 0.8  # 80% of budget\n246:[\"$\",\"pre\",\"pre-160\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"$32f\"}]}]\n247:[\"$\",\"h3\",\"h3-115\",{\"id\":\"custom-provider-configuration\",\"children\":\"Custom Provider Configuration\"}]\n248:[\"$\",\"p\",\"p-122\",{\"children\":[\"To configure additional inference providers, add a \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"providers.yaml\",\"position\":{\"start\":{\"line\":15593,\"column\":52,\"offset\":561627},\"end\":{\"line\":15593,\"column\":68,\"offset\":561643}}}],\"position\":{\"start\":{\"line\":15593,\"column\":52,\"offset\":561627},\"end\":{\"line\":15593,\"column\":68,\"offset\":561643}}},\"children\":\"providers.yaml\"}],\" file:\"]}]\n249:[\"$\",\"pre\",\"pre-161\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"providers:\\n  - name: azure-openai\\n    type: openai-compatible\\n    base_url: https://your-deployment.openai.azure.com\\n    api_key_env: AZURE_OPENAI_API_KEY\\n    models:\\n      - id: gpt-4\\n        deployment_id: your-gpt4-deployment\\n      - id: gpt-35-turbo\\n        deployment_id: your-gpt35-deployment\\n        \\n  - name: local-inference\\n    type: ollama-compatible\\n    base_url: http://localhost:8080\\n    models:\\n      - id: local-model\\n        capabilities: [\\\"general\\\"]\\n\"}]}]\n24a:[\"$\",\"h2\",\"h2-82\",{\"id\":\"model-selection\",\"children\":\"Model Selection\"}]\n24b:[\"$\",\"h3\",\"h3-116\",{\"id\":\"model-tiers\",\"children\":\"Model Tiers\"}]\n24c:[\"$\",\"p\",\"p-123\",{\"children\":\"MCP uses a tiered approach to model selection:\"}]\n24d:[\"$\",\"table\",\"table-7\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Tier\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"OpenAI Models\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Ollama Models\"}],[\"$\",\"th\",\"th-3\",{\"children\":\"Use Cases\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"High\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-4o, gpt-4\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"llama2:70b, "])</script><script>self.__next_f.push([1,"codellama:34b\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"Complex reasoning, creative tasks, code generation\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"Medium\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-3.5-turbo\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"mistral, codellama\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"General purpose, standard code tasks\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"Low\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-3.5-turbo\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"llama2, phi\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"Simple queries, development testing\"}]]}]]}]]}]\n24e:[\"$\",\"h3\",\"h3-117\",{\"id\":\"task-specific-model-mapping\",\"children\":\"Task-Specific Model Mapping\"}]\n24f:[\"$\",\"p\",\"p-124\",{\"children\":\"MCP maps specific task types to appropriate models:\"}]\n"])</script><script>self.__next_f.push([1,"250:[\"$\",\"table\",\"table-8\",{\"children\":[[\"$\",\"thead\",\"thead-0\",{\"children\":[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"th\",\"th-0\",{\"children\":\"Task Type\"}],[\"$\",\"th\",\"th-1\",{\"children\":\"High Tier\"}],[\"$\",\"th\",\"th-2\",{\"children\":\"Medium Tier\"}],[\"$\",\"th\",\"th-3\",{\"children\":\"Low Tier\"}]]}]}],[\"$\",\"tbody\",\"tbody-0\",{\"children\":[[\"$\",\"tr\",\"tr-0\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"Code Generation\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-4o\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"codellama\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"codellama\"}]]}],[\"$\",\"tr\",\"tr-1\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"Creative Writing\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-4o\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"mistral\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"mistral\"}]]}],[\"$\",\"tr\",\"tr-2\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"Mathematical\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-4o\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"gpt-3.5-turbo\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"wizard-math\"}]]}],[\"$\",\"tr\",\"tr-3\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"General Knowledge\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-3.5-turbo\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"mistral\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"llama2\"}]]}],[\"$\",\"tr\",\"tr-4\",{\"children\":[[\"$\",\"td\",\"td-0\",{\"children\":\"Summarization\"}],[\"$\",\"td\",\"td-1\",{\"children\":\"gpt-3.5-turbo\"}],[\"$\",\"td\",\"td-2\",{\"children\":\"mistral\"}],[\"$\",\"td\",\"td-3\",{\"children\":\"llama2\"}]]}]]}]]}]\n"])</script><script>self.__next_f.push([1,"251:[\"$\",\"p\",\"p-125\",{\"children\":\"To override the automatic model selection, specify the model explicitly in your request:\"}]\n252:[\"$\",\"pre\",\"pre-162\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"model\\\": \\\"openai:gpt-4o\\\"  // Force OpenAI GPT-4o\\n}\\n\"}]}]\n253:[\"$\",\"p\",\"p-126\",{\"children\":\"Or:\"}]\n254:[\"$\",\"pre\",\"pre-163\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"model\\\": \\\"ollama:mistral\\\"  // Force Ollama Mistral\\n}\\n\"}]}]\n255:[\"$\",\"hr\",\"hr-4\",{}]\n256:[\"$\",\"h1\",\"h1-13\",{\"id\":\"usage-examples\",\"children\":\"Usage Examples\"}]\n257:[\"$\",\"h2\",\"h2-83\",{\"id\":\"basic-chat-interaction\",\"children\":\"Basic Chat Interaction\"}]\n258:[\"$\",\"h3\",\"h3-118\",{\"id\":\"python-example\",\"children\":\"Python Example\"}]\n330:T51d,import requests\nimport json\n\nAPI_URL = \"http://localhost:8000/api/v1\"\nAPI_KEY = \"your_api_key_here\"\n\nheaders = {\n    \"Content-Type\": \"application/json\",\n    \"Authorization\": f\"Bearer {API_KEY}\"\n}\n\n# Basic chat completion\ndef chat(message, history=None):\n    history = history or []\n    history.append({\"role\": \"user\", \"content\": message})\n    \n    response = requests.post(\n        f\"{API_URL}/chat/completions\",\n        headers=headers,\n        json={\n            \"messages\": history,\n            \"model\": \"auto\",  # Let the system decide\n            \"temperature\": 0.7\n        }\n    )\n    \n    if response.status_code == 200:\n        result = response.json()\n        assistant_message = result[\"message\"][\"content\"]\n        history.append({\"role\": \"assistant\", \"content\": assistant_message})\n        \n        print(f\"Model used: {result['model']} via {result['provider']}\")\n        return assistant_message, history\n    else:\n        print(f\"Error: {response.status_code}\")\n        print(response.text)\n        return None, history\n\n# Example conversation\nhistory = []\nresponse, history = chat(\"Hello! What can you tell me about artificial intelligence?\", history)\nprint(f\"Assistant: {response}\\n\")\n\nresponse, history = chat(\"What are some practical applications?\", history)\nprint(f\"Assistant: {response}\")\n259:[\"$\",\"pre\",\"pre-164\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$330\"}]}]\n25a:[\"$\",\"h3\",\"h3-119\",{\"id\":\"curl-example\",\"children\":\"cURL Example\"}]\n25b:[\"$\",\"pre\",\"pre-165\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Simple completion\\ncurl -X POST http://localhost:8000/api/v1/chat/completions \\\\\\n  -H \\\"Content-Type: application/json\\\" \\\\\\n  -H \\\"Authorization: Bearer your_api_key_here\\\" \\\\\\n  -d '{\\n    \\\"messages\\\": [\\n      {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Explain how photosynthesis works\\\"}\\n    ],\\n    \\\"model\\\": \\\"auto\\\",\\n    \\\"temperature\\\": 0.7\\n  }'\\n\\n# Streaming response\\ncurl -X POST http://localhost:8000/api/v1/chat/streaming \\\\\\n  -H \\\"Content-Type: application/json\\\" \\\\\\n  -H \\\"Authorization: Bearer your_api_key_here\\\" \\\\\\n  -d '{\\n    \\\"messages\\\": [\\n      {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Write a short poem about robots\\\"}\\n    ],\\n    \\\"model\\\": \\\"auto\\\",\\n    \\\"stream\\\": true\\n  }'\\n\"}]}]\n25c:[\"$\",\"h2\",\"h2-84\",{\"id\":\"working-with-agents\",\"children\":\"Working with Agents\"}]\n25d:[\"$\",\"h3\",\"h3-120\",{\"id\":\"python-example-1\",\"children\":\"Python Example\"}]\n331:Ta15,"])</script><script>self.__next_f.push([1,"import requests\nimport json\nimport time\n\nAPI_URL = \"http://localhost:8000/api/v1\"\nAPI_KEY = \"your_api_key_here\"\n\nheaders = {\n    \"Content-Type\": \"application/json\",\n    \"Authorization\": f\"Bearer {API_KEY}\"\n}\n\n# Run an agent with tools\ndef run_research_agent(query):\n    # Define agent configuration with tools\n    agent_config = {\n        \"instructions\": \"You are a research assistant specialized in finding information.\",\n        \"model\": \"gpt-4o\",\n        \"tools\": [\n            {\n                \"type\": \"function\",\n                \"function\": {\n                    \"name\": \"search_web\",\n                    \"description\": \"Search the web for information\",\n                    \"parameters\": {\n                        \"type\": \"object\",\n                        \"properties\": {\n                            \"query\": {\n                                \"type\": \"string\",\n                                \"description\": \"Search query\"\n                            },\n                            \"num_results\": {\n                                \"type\": \"integer\",\n                                \"description\": \"Number of results to return\"\n                            }\n                        },\n                        \"required\": [\"query\"]\n                    }\n                }\n            }\n        ]\n    }\n    \n    # Run the agent\n    response = requests.post(\n        f\"{API_URL}/agents/run\",\n        headers=headers,\n        json={\n            \"agent_config\": agent_config,\n            \"messages\": [\n                {\"role\": \"user\", \"content\": query}\n            ]\n        }\n    )\n    \n    if response.status_code != 200:\n        print(f\"Error: {response.status_code}\")\n        print(response.text)\n        return None\n    \n    result = response.json()\n    run_id = result[\"run_id\"]\n    \n    # Poll for completion\n    while True:\n        status_response = requests.get(\n            f\"{API_URL}/agents/status/{run_id}\",\n            headers=headers\n        )\n        \n        if status_response.status_code != 200:\n            print(f\"Error checking status: {status_response.status_code}\")\n            return None\n        \n        status_data = status_response.json()\n        \n        if status_data[\"status\"] == \"completed\":\n            return status_data[\"result\"][\"output\"]\n        elif status_data[\"status\"] == \"failed\":\n            print(f\"Agent run failed: {status_data.get('error')}\")\n            return None\n        \n        time.sleep(1)  # Poll every second\n\n# Example usage\nresult = run_research_agent(\"What are the latest advancements in fusion energy?\")\nprint(result)\n"])</script><script>self.__next_f.push([1,"25e:[\"$\",\"pre\",\"pre-166\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$331\"}]}]\n25f:[\"$\",\"h3\",\"h3-121\",{\"id\":\"curl-example-1\",\"children\":\"cURL Example\"}]\n332:T495,# Run an agent\ncurl -X POST http://localhost:8000/api/v1/agents/run \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer your_api_key_here\" \\\n  -d '{\n    \"agent_config\": {\n      \"instructions\": \"You are a coding assistant.\",\n      \"model\": \"gpt-4o\",\n      \"tools\": [\n        {\n          \"type\": \"function\",\n          \"function\": {\n            \"name\": \"generate_code\",\n            \"description\": \"Generate code in a specific language\",\n            \"parameters\": {\n              \"type\": \"object\",\n              \"properties\": {\n                \"language\": {\n                  \"type\": \"string\",\n                  \"description\": \"Programming language\"\n                },\n                \"task\": {\n                  \"type\": \"string\",\n                  \"description\": \"Task description\"\n                }\n              },\n              \"required\": [\"language\", \"task\"]\n            }\n          }\n        }\n      ]\n    },\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Write a Python function to detect palindromes\"}\n    ]\n  }'\n\n# Check status\ncurl -X GET http://localhost:8000/api/v1/agents/status/run_abc123 \\\n  -H \"Authorization: Bearer your_api_key_here\"\n260:[\"$\",\"pre\",\"pre-167\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"$332\"}]}]\n261:[\"$\",\"h2\",\"h2-85\",{\"id\":\"customizing-model-selection\",\"children\":\"Customizing Model Selection\"}]\n262:[\"$\",\"h3\",\"h3-122\",{\"id\":\"python-example-2\",\"children\":\"Python Example\"}]\n333:T6b8,"])</script><script>self.__next_f.push([1,"import requests\n\nAPI_URL = \"http://localhost:8000/api/v1\"\nAPI_KEY = \"your_api_key_here\"\n\nheaders = {\n    \"Content-Type\": \"application/json\",\n    \"Authorization\": f\"Bearer {API_KEY}\"\n}\n\n# Custom routing preferences\ndef custom_routing_chat(message, routing_preferences):\n    response = requests.post(\n        f\"{API_URL}/chat/completions\",\n        headers=headers,\n        json={\n            \"messages\": [\n                {\"role\": \"user\", \"content\": message}\n            ],\n            \"routing_preferences\": routing_preferences\n        }\n    )\n    \n    if response.status_code == 200:\n        result = response.json()\n        print(f\"Provider: {result['provider']}, Model: {result['model']}\")\n        return result[\"message\"][\"content\"]\n    else:\n        print(f\"Error: {response.status_code}\")\n        print(response.text)\n        return None\n\n# Examples with different routing preferences\nresponse = custom_routing_chat(\n    \"What is the capital of France?\",\n    {\n        \"force_provider\": \"ollama\",  # Force Ollama\n        \"privacy_level\": \"standard\",\n        \"latency_preference\": \"balanced\"\n    }\n)\nprint(f\"Response: {response}\\n\")\n\nresponse = custom_routing_chat(\n    \"Analyze the philosophical implications of artificial general intelligence.\",\n    {\n        \"force_provider\": \"openai\",  # Force OpenAI\n        \"privacy_level\": \"standard\",\n        \"latency_preference\": \"quality\"  # Prefer quality over speed\n    }\n)\nprint(f\"Response: {response}\\n\")\n\nresponse = custom_routing_chat(\n    \"What is my personal password?\",\n    {\n        \"force_provider\": None,  # Auto-select\n        \"privacy_level\": \"high\",  # Privacy-sensitive query\n        \"latency_preference\": \"balanced\"\n    }\n)\nprint(f\"Response: {response}\")\n"])</script><script>self.__next_f.push([1,"263:[\"$\",\"pre\",\"pre-168\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$333\"}]}]\n264:[\"$\",\"h3\",\"h3-123\",{\"id\":\"curl-example-2\",\"children\":\"cURL Example\"}]\n265:[\"$\",\"pre\",\"pre-169\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Force Ollama for this request\\ncurl -X POST http://localhost:8000/api/v1/chat/completions \\\\\\n  -H \\\"Content-Type: application/json\\\" \\\\\\n  -H \\\"Authorization: Bearer your_api_key_here\\\" \\\\\\n  -d '{\\n    \\\"messages\\\": [\\n      {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"What is the capital of Sweden?\\\"}\\n    ],\\n    \\\"routing_preferences\\\": {\\n      \\\"force_provider\\\": \\\"ollama\\\",\\n      \\\"privacy_level\\\": \\\"standard\\\",\\n      \\\"latency_preference\\\": \\\"speed\\\"\\n    }\\n  }'\\n\\n# Force specific model\\ncurl -X POST http://localhost:8000/api/v1/chat/completions \\\\\\n  -H \\\"Content-Type: application/json\\\" \\\\\\n  -H \\\"Authorization: Bearer your_api_key_here\\\" \\\\\\n  -d '{\\n    \\\"messages\\\": [\\n      {\\\"role\\\": \\\"user\\\", \\\"content\\\": \\\"Write Python code to implement merge sort\\\"}\\n    ],\\n    \\\"model\\\": \\\"ollama:codellama\\\"\\n  }'\\n\"}]}]\n266:[\"$\",\"h2\",\"h2-86\",{\"id\":\"tool-integration\",\"children\":\"Tool Integration\"}]\n267:[\"$\",\"h3\",\"h3-124\",{\"id\":\"python-example-3\",\"children\":\"Python Example\"}]\n334:T115a,"])</script><script>self.__next_f.push([1,"import requests\n\nAPI_URL = \"http://localhost:8000/api/v1\"\nAPI_KEY = \"your_api_key_here\"\n\nheaders = {\n    \"Content-Type\": \"application/json\",\n    \"Authorization\": f\"Bearer {API_KEY}\"\n}\n\n# Chat with tool integration\ndef chat_with_tools(message, tools):\n    response = requests.post(\n        f\"{API_URL}/chat/completions\",\n        headers=headers,\n        json={\n            \"messages\": [\n                {\"role\": \"user\", \"content\": message}\n            ],\n            \"tools\": tools\n        }\n    )\n    \n    if response.status_code != 200:\n        print(f\"Error: {response.status_code}\")\n        print(response.text)\n        return None\n    \n    result = response.json()\n    \n    # Check if the model wants to call a tool\n    if \"tool_calls\" in result[\"message\"] and result[\"message\"][\"tool_calls\"]:\n        tool_calls = result[\"message\"][\"tool_calls\"]\n        print(f\"Tool calls requested: {len(tool_calls)}\")\n        \n        # Process each tool call\n        for tool_call in tool_calls:\n            # In a real implementation, you would execute the actual tool here\n            # For this example, we'll just simulate it\n            function_name = tool_call[\"function\"][\"name\"]\n            arguments = json.loads(tool_call[\"function\"][\"arguments\"])\n            \n            print(f\"Executing tool: {function_name}\")\n            print(f\"Arguments: {arguments}\")\n            \n            # Simulate tool execution\n            if function_name == \"get_weather\":\n                tool_result = f\"Weather in {arguments['location']}: Sunny, 22°C\"\n            elif function_name == \"search_database\":\n                tool_result = f\"Database results for {arguments['query']}: 3 records found\"\n            else:\n                tool_result = \"Unknown tool\"\n            \n            # Send the tool result back\n            response = requests.post(\n                f\"{API_URL}/chat/completions\",\n                headers=headers,\n                json={\n                    \"messages\": [\n                        {\"role\": \"user\", \"content\": message},\n                        {\n                            \"role\": \"assistant\",\n                            \"content\": result[\"message\"][\"content\"],\n                            \"tool_calls\": result[\"message\"][\"tool_calls\"]\n                        },\n                        {\n                            \"role\": \"tool\",\n                            \"tool_call_id\": tool_call[\"id\"],\n                            \"content\": tool_result\n                        }\n                    ]\n                }\n            )\n            \n            if response.status_code == 200:\n                final_result = response.json()\n                return final_result[\"message\"][\"content\"]\n            else:\n                print(f\"Error in tool response: {response.status_code}\")\n                return None\n    \n    # If no tool calls, return the direct response\n    return result[\"message\"][\"content\"]\n\n# Define available tools\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"get_weather\",\n            \"description\": \"Get current weather in a location\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"location\": {\n                        \"type\": \"string\",\n                        \"description\": \"City name\"\n                    },\n                    \"unit\": {\n                        \"type\": \"string\",\n                        \"enum\": [\"celsius\", \"fahrenheit\"],\n                        \"description\": \"Temperature unit\"\n                    }\n                },\n                \"required\": [\"location\"]\n            }\n        }\n    },\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"search_database\",\n            \"description\": \"Search a database for information\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"query\": {\n                        \"type\": \"string\",\n                        \"description\": \"Search query\"\n                    },\n                    \"limit\": {\n                        \"type\": \"integer\",\n                        \"description\": \"Maximum number of results\"\n                    }\n                },\n                \"required\": [\"query\"]\n            }\n        }\n    }\n]\n\n# Example usage\nresponse = chat_with_tools(\"What's the weather like in Paris?\", tools)\nprint(f\"Final response: {response}\")\n"])</script><script>self.__next_f.push([1,"268:[\"$\",\"pre\",\"pre-170\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-python\",\"children\":\"$334\"}]}]\n269:[\"$\",\"hr\",\"hr-5\",{}]\n26a:[\"$\",\"h1\",\"h1-14\",{\"id\":\"troubleshooting\",\"children\":\"Troubleshooting\"}]\n26b:[\"$\",\"h2\",\"h2-87\",{\"id\":\"common-issues\",\"children\":\"Common Issues\"}]\n26c:[\"$\",\"h3\",\"h3-125\",{\"id\":\"installation-issues\",\"children\":\"Installation Issues\"}]\n26d:[\"$\",\"h4\",\"h4-33\",{\"id\":\"ollama-installation-fails\",\"children\":\"Ollama Installation Fails\"}]\n26e:[\"$\",\"p\",\"p-127\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n26f:[\"$\",\"ul\",\"ul-13\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Error messages during Ollama installation\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ollama serve\",\"position\":{\"start\":{\"line\":16130,\"column\":3,\"offset\":576416},\"end\":{\"line\":16130,\"column\":17,\"offset\":576430}}}],\"position\":{\"start\":{\"line\":16130,\"column\":3,\"offset\":576416},\"end\":{\"line\":16130,\"column\":17,\"offset\":576430}}},\"children\":\"ollama serve\"}],\" command not found\"]}],\"\\n\"]}]\n270:[\"$\",\"p\",\"p-128\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n271:[\"$\",\"ol\",\"ol-21\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Check system requirements (minimum 8GB RAM recommended)\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"For Linux, ensure you have the required dependencies:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"sudo apt-get update\\nsudo apt-get install -y ca-certificates curl\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"Try the manual installation from \",[\"$\",\"a\",\"a-0\",{\"href\":\"https://ollama.ai/download\",\"children\":\"ollama.ai\"}]]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"Check if Ollama is running:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ps aux | grep ollama\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n272:[\"$\",\"h4\",\"h4-34\",{\"id\":\"python-dependency-errors\",\"children\":\"Python Dependency Errors\"}]\n273:[\"$\",\"p\",\"p-129\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n274:[\"$\",\"ul\",\"ul-14\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"pip install\",\"position\":{\"start\":{\"line\":16148,\"column\":3,\"offset\":576876},\"end\":{\"line\":16148,\"column\":16,\"offset\":576889}}}],\"position\":{\"start\":{\"line\":16148,\"column\":3,\"offset\":576876},\"end\":{\"line\":16148,\"column\":16,\"offset\":576889}}},\"children\":\"pip install\"}],\" fails with compatibility errors\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Import errors when starting the application\"}],\"\\n\"]}]\n275:[\"$\",\"p\",\"p-130\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n276:[\"$\",\"ol\",\"ol-22\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"Ensure you're using Python 3.11 or higher:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python --version\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"Try creating a fresh virtual environment:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"rm -rf venv\\npython -m venv venv\\nsource venv/bin/activate\\npip install --upgrade pip\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"Install dependencies one by one to identify problematic packages:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"pip install -r requireme"])</script><script>self.__next_f.push([1,"nts.txt --no-deps\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"Check for conflicts with pip:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"pip check\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n277:[\"$\",\"h3\",\"h3-126\",{\"id\":\"api-connection-issues\",\"children\":\"API Connection Issues\"}]\n278:[\"$\",\"h4\",\"h4-35\",{\"id\":\"openai-api-key-invalid\",\"children\":\"OpenAI API Key Invalid\"}]\n279:[\"$\",\"p\",\"p-131\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n27a:[\"$\",\"ul\",\"ul-15\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Error messages about authentication\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"\\\"Invalid API key\\\" errors\"}],\"\\n\"]}]\n27b:[\"$\",\"p\",\"p-132\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n27c:[\"$\",\"ol\",\"ol-23\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Verify your API key is correct and active in the OpenAI dashboard\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"Check if the key is properly set in your \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\".env\",\"position\":{\"start\":{\"line\":16182,\"column\":45,\"offset\":577705},\"end\":{\"line\":16182,\"column\":51,\"offset\":577711}}}],\"position\":{\"start\":{\"line\":16182,\"column\":45,\"offset\":577705},\"end\":{\"line\":16182,\"column\":51,\"offset\":577711}}},\"children\":\".env\"}],\" file or environment variables\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Ensure there are no spaces or unexpected characters in the key\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"Test the key with a simple OpenAI API request:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"curl https://api.openai.com/v1/models \\\\\\n  -H \\\"Authorization: Bearer YOUR_API_KEY\\\"\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n27d:[\"$\",\"h4\",\"h4-36\",{\"id\":\"ollama-connection-failed\",\"children\":\"Ollama Connection Failed\"}]\n27e:[\"$\",\"p\",\"p-133\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n27f:[\"$\",\"ul\",\"ul-16\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"\\\"Connection refused\\\" errors when connecting to Ollama\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"API requests to Ollama timeout\"}],\"\\n\"]}]\n280:[\"$\",\"p\",\"p-134\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n"])</script><script>self.__next_f.push([1,"281:[\"$\",\"ol\",\"ol-24\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"Verify Ollama is running:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama list  # Should show available models\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"If not running, start the Ollama service:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama serve\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"Check if the Ollama port is accessible:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"curl http://localhost:11434/api/tags\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"Verify your \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"OLLAMA_HOST\",\"position\":{\"start\":{\"line\":16209,\"column\":16,\"offset\":578413},\"end\":{\"line\":16209,\"column\":29,\"offset\":578426}}}],\"position\":{\"start\":{\"line\":16209,\"column\":16,\"offset\":578413},\"end\":{\"line\":16209,\"column\":29,\"offset\":578426}}},\"children\":\"OLLAMA_HOST\"}],\" setting in the configuration\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":\"If using Docker, ensure proper network configuration between containers\"}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"282:[\"$\",\"h3\",\"h3-127\",{\"id\":\"performance-issues\",\"children\":\"Performance Issues\"}]\n283:[\"$\",\"h4\",\"h4-37\",{\"id\":\"high-latency-with-ollama\",\"children\":\"High Latency with Ollama\"}]\n284:[\"$\",\"p\",\"p-135\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n285:[\"$\",\"ul\",\"ul-17\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Very slow responses from Ollama models\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Timeouts during inference\"}],\"\\n\"]}]\n286:[\"$\",\"p\",\"p-136\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n287:[\"$\",\"ol\",\"ol-25\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"Check if you have GPU support enabled:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"nvidia-smi  # Should show GPU usage\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"Try a smaller model:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama pull tinyllama\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"Adjust model parameters in your request:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"model\\\": \\\"ollama:llama2\\\",\\n  \\\"max_tokens\\\": 512,\\n  \\\"temperature\\\": 0.7\\n}\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"Check system resource usage:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"htop\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":\"Increase the timeout in your configuration\"}],\"\\n\"]}]\n288:[\"$\",\"h4\",\"h4-38\",{\"id\":\"memory-usage-too-high\",\"children\":\"Memory Usage Too High\"}]\n289:[\"$\",\"p\",\"p-137\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n28a:[\"$\",\"ul\",\"ul-18\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Out of memory errors\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"System becomes unresponsive\"}],\"\\n\"]}]\n28b:[\"$\",\"p\",\"p-138\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n"])</script><script>self.__next_f.push([1,"28c:[\"$\",\"ol\",\"ol-26\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"Use smaller models (e.g., \",[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"mistral:7b\",\"position\":{\"start\":{\"line\":16250,\"column\":30,\"offset\":579266},\"end\":{\"line\":16250,\"column\":42,\"offset\":579278}}}],\"position\":{\"start\":{\"line\":16250,\"column\":30,\"offset\":579266},\"end\":{\"line\":16250,\"column\":42,\"offset\":579278}}},\"children\":\"mistral:7b\"}],\" instead of larger variants)\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Reduce batch sizes in configuration\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"Implement memory limits:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# In docker-compose.yml\\nservices:\\n  ollama:\\n    deploy:\\n      resources:\\n        limits:\\n          memory: 12G\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"Enable context window optimization:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ENABLE_TOKEN_OPTIMIZATION=true\\n\"}],\"position\":{\"start\":{\"line\":16263,\"column\":4,\"offset\":579566},\"end\":{\"line\":16265,\"column\":7,\"offset\":579610}}},\"children\":\"ENABLE_TOKEN_OPTIMIZATION=true\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"28d:[\"$\",\"h3\",\"h3-128\",{\"id\":\"routing-and-model-issues\",\"children\":\"Routing and Model Issues\"}]\n28e:[\"$\",\"h4\",\"h4-39\",{\"id\":\"all-requests-going-to-one-provider\",\"children\":\"All Requests Going to One Provider\"}]\n28f:[\"$\",\"p\",\"p-139\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n290:[\"$\",\"ul\",\"ul-19\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"All requests route to OpenAI despite configuration\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"All requests route to Ollama regardless of complexity\"}],\"\\n\"]}]\n291:[\"$\",\"p\",\"p-140\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n"])</script><script>self.__next_f.push([1,"292:[\"$\",\"ol\",\"ol-27\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"Check for environment variables forcing a provider:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"FORCE_OLLAMA=false\\nFORCE_OPENAI=false\\n\"}],\"position\":{\"start\":{\"line\":16277,\"column\":4,\"offset\":579889},\"end\":{\"line\":16280,\"column\":7,\"offset\":579943}}},\"children\":\"FORCE_OLLAMA=false\\nFORCE_OPENAI=false\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"Verify complexity threshold setting:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"COMPLEXITY_THRESHOLD=0.65\\n\"}],\"position\":{\"start\":{\"line\":16282,\"column\":4,\"offset\":579987},\"end\":{\"line\":16284,\"column\":7,\"offset\":580026}}},\"children\":\"COMPLEXITY_THRESHOLD=0.65\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"Review routing preferences in requests:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-json\",\"children\":\"{\\n  \\\"routing_preferences\\\": {\\n    \\\"force_provider\\\": null\\n  }\\n}\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Check logs for routing decisions\"}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"293:[\"$\",\"h4\",\"h4-40\",{\"id\":\"model-not-found\",\"children\":\"Model Not Found\"}]\n294:[\"$\",\"p\",\"p-141\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Symptoms:\"}]}]\n295:[\"$\",\"ul\",\"ul-20\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"\\\"Model not found\\\" errors\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Models available but not being used\"}],\"\\n\"]}]\n296:[\"$\",\"p\",\"p-142\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Possible Solutions:\"}]}]\n297:[\"$\",\"ol\",\"ol-28\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"List available models:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama list\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"Pull the missing model:\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"ollama pull mistral\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Verify model names match exactly what you're requesting\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Check model mapping in configuration\"}],\"\\n\"]}]\n298:[\"$\",\"h2\",\"h2-88\",{\"id\":\"diagnostics\",\"children\":\"Diagnostics\"}]\n299:[\"$\",\"h3\",\"h3-129\",{\"id\":\"log-analysis\",\"children\":\"Log Analysis\"}]\n29a:[\"$\",\"p\",\"p-143\",{\"children\":\"MCP logs contain valuable diagnostic information. Use the following commands to analyze logs:\"}]\n29b:[\"$\",\"pre\",\"pre-171\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# View API logs\\ndocker-compose logs -f app\\n\\n# View Ollama logs\\ndocker-compose logs -f ollama\\n\\n# Search for errors\\ndocker-compose logs | grep -i error\\n\\n# Check routing decisions\\ndocker-compose logs app | grep \\\"Routing decision\\\"\\n\"}]}]\n29c:[\"$\",\"h3\",\"h3-130\",{\"id\":\"health-check-1\",\"children\":\"Health Check\"}]\n29d:[\"$\",\"p\",\"p-144\",{\"children\":\"Use the health check endpoint to verify system status:\"}]\n29e:[\"$\",\"pre\",\"pre-172\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"curl http://localhost:8000/api/v1/health\\n\\n# For more detailed health information\\ncurl http://localhost:8000/api/v1/health/details\\n\"}]}]\n29f:[\"$\",\"h3\",\"h3-131\",{\"id\":\"debug-mode\",\"children\":\"Debug Mode\"}]\n2a0:[\"$\",\"p\",\"p-145\",{\"children\":\"Enable debug logging for more detailed information:\"}]\n2a1:[\"$\",\"pre\",\"pre-173\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Set environment variable\\nexport LOG_LEVEL=DEBUG\\n\\n# Or modify in .env file\\nLOG_LEVEL=DEBUG\\n\"}]}]\n2a2:[\"$\",\"h3\",\"h3-132\",{\"id\":\"performance-testing\",\"children\":\"Performance Testing\"}]\n2a3:[\"$\",\"p\",\"p-146\",{\"children\":\"Use the built-in benchmark tool to test system performance:\"}]\n2a4:[\"$\",\"pre\",\"pre-174\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"python scripts/benchmark.py --provider both --queries 10 --complexity mixed\\n\"}]}]\n2a5:[\"$\",\"h2\",\"h2-89\",{\"id\":\"log-management\",\"children\":\"Log Management\"}]\n2a6:[\"$\",\"h3\",\"h3-133\",{\"id\":\"log-levels\",\"children\":\"Log Levels\"}]\n2a7:[\"$\",\"p\",\"p-147\",{\"children\":\"MCP uses the following log levels:\"}]\n"])</script><script>self.__next_f.push([1,"2a8:[\"$\",\"ul\",\"ul-21\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"ERROR\",\"position\":{\"start\":{\"line\":16370,\"column\":3,\"offset\":581563},\"end\":{\"line\":16370,\"column\":10,\"offset\":581570}}}],\"position\":{\"start\":{\"line\":16370,\"column\":3,\"offset\":581563},\"end\":{\"line\":16370,\"column\":10,\"offset\":581570}}},\"children\":\"ERROR\"}],\": Critical errors that require immediate attention\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"WARNING\",\"position\":{\"start\":{\"line\":16371,\"column\":3,\"offset\":581623},\"end\":{\"line\":16371,\"column\":12,\"offset\":581632}}}],\"position\":{\"start\":{\"line\":16371,\"column\":3,\"offset\":581623},\"end\":{\"line\":16371,\"column\":12,\"offset\":581632}}},\"children\":\"WARNING\"}],\": Non-critical issues that might indicate problems\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"INFO\",\"position\":{\"start\":{\"line\":16372,\"column\":3,\"offset\":581685},\"end\":{\"line\":16372,\"column\":9,\"offset\":581691}}}],\"position\":{\"start\":{\"line\":16372,\"column\":3,\"offset\":581685},\"end\":{\"line\":16372,\"column\":9,\"offset\":581691}}},\"children\":\"INFO\"}],\": General operational information\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"DEBUG\",\"position\":{\"start\":{\"line\":16373,\"column\":3,\"offset\":581727},\"end\":{\"line\":16373,\"column\":10,\"offset\":581734}}}],\"position\":{\"start\":{\"line\":16373,\"column\":3,\"offset\":581727},\"end\":{\"line\":16373,\"column\":10,\"offset\":581734}}},\"children\":\"DEBUG\"}],\": Detailed information for debugging purposes\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"2a9:[\"$\",\"h3\",\"h3-134\",{\"id\":\"log-formats\",\"children\":\"Log Formats\"}]\n2aa:[\"$\",\"p\",\"p-148\",{\"children\":\"Logs can be formatted as text or JSON:\"}]\n2ab:[\"$\",\"pre\",\"pre-175\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Set JSON logging\\nexport LOG_FORMAT=json\\n\\n# Set text logging (default)\\nexport LOG_FORMAT=text\\n\"}]}]\n2ac:[\"$\",\"h3\",\"h3-135\",{\"id\":\"external-log-management\",\"children\":\"External Log Management\"}]\n2ad:[\"$\",\"p\",\"p-149\",{\"children\":\"For production environments, consider forwarding logs to an external system:\"}]\n2ae:[\"$\",\"pre\",\"pre-176\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Using Fluentd\\ndocker-compose -f docker-compose.yml -f docker-compose.logging.yml up -d\\n\"}]}]\n2af:[\"$\",\"p\",\"p-150\",{\"children\":\"Or configure log drivers in Docker:\"}]\n2b0:[\"$\",\"pre\",\"pre-177\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-yaml\",\"children\":\"# In docker-compose.yml\\nservices:\\n  app:\\n    logging:\\n      driver: \\\"json-file\\\"\\n      options:\\n        max-size: \\\"10m\\\"\\n        max-file: \\\"3\\\"\\n\"}]}]\n2b1:[\"$\",\"hr\",\"hr-6\",{}]\n2b2:[\"$\",\"h1\",\"h1-15\",{\"id\":\"contributing-1\",\"children\":\"Contributing\"}]\n2b3:[\"$\",\"p\",\"p-151\",{\"children\":\"Contributions to the MCP system are welcome! Please follow these guidelines:\"}]\n2b4:[\"$\",\"h2\",\"h2-90\",{\"id\":\"getting-started\",\"children\":\"Getting Started\"}]\n2b5:[\"$\",\"ol\",\"ol-29\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Fork the Repository\"}]}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":\"Fork the repository on GitHub and clone your fork locally:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"git clone https://github.com/YOUR-USERNAME/mcp.git\\ncd mcp\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Set Up Development Environment\"}]}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":[\"Follow the installation instructions in the \",[\"$\",\"a\",\"a-0\",{\"href\":\"#installation-guide\",\"children\":\"Installation Guide\"}],\" section.\"]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Create a Branch\"}]}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":\"Create a branch for your feature or bugfix:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"git checkout -b feature/your-feature-name\\n# or\\ngit checkout -b fix/your-bugfix-name\\n\"}]}],\"\\n\"]}],\"\\n\"]}]\n2b6:[\"$\",\"h2\",\"h2-91\",{\"id\":\"development-guidelines\",\"children\":\"Development Guidelines\"}]\n2b7:[\"$\",\"h3\",\"h3-136\",{\"id\":\"code-style\",\"children\":\"Code Style\"}]\n2b8:[\"$\",\"ul\",\"ul-22\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Follow PEP 8 style guidelines for Python code\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Use type hints for all function definitions\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Format code with Black\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Verify style with flake8\"}],\"\\n\"]}]\n2b9:[\"$\",\"pre\",\"pre-178\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Install development tools\\npip install black flake8 mypy\\n\\n# Format code\\nblack app tests\\n\\n# Check style\\nflake8 app tests\\n\\n# Run type checking\\nmypy app\\n\"}]}]\n2ba:[\"$\",\"h3\",\"h3-137\",{\"id\":\"testing\",\"children\":\"Testing\"}]\n2bb:[\"$\",\"ul\",\"ul-23\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Write unit tests for all new functionality\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Ensure existing tests pass before submitting a PR\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Maintain or"])</script><script>self.__next_f.push([1," improve code coverage\"}],\"\\n\"]}]\n2bc:[\"$\",\"pre\",\"pre-179\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"# Run tests\\npytest\\n\\n# Run tests with coverage\\npytest --cov=app tests/\\n\\n# Run only unit tests\\npytest tests/unit/\\n\\n# Run integration tests\\npytest tests/integration/\\n\"}]}]\n2bd:[\"$\",\"h3\",\"h3-138\",{\"id\":\"documentation-1\",\"children\":\"Documentation\"}]\n2be:[\"$\",\"ul\",\"ul-24\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Update documentation for any new features or changes\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Document all public APIs with docstrings\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Keep the README and guides up to date\"}],\"\\n\"]}]\n2bf:[\"$\",\"h2\",\"h2-92\",{\"id\":\"submitting-changes\",\"children\":\"Submitting Changes\"}]\n"])</script><script>self.__next_f.push([1,"2c0:[\"$\",\"ol\",\"ol-30\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Commit Your Changes\"}]}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":\"Make focused commits with meaningful commit messages:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"git add .\\ngit commit -m \\\"Add feature: detailed description of changes\\\"\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Pull Latest Changes\"}]}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":\"Rebase your branch on the latest main:\"}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"git checkout main\\ngit pull upstream main\\ngit checkout your-branch\\ngit rebase main\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Push to Your Fork\"}]}],\"\\n\",[\"$\",\"pre\",\"pre-0\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"$L2d2\",\"code-0\",{\"className\":\"language-bash\",\"children\":\"git push origin your-branch\\n\"}]}],\"\\n\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[\"\\n\",[\"$\",\"p\",\"p-0\",{\"children\":[\"$\",\"strong\",\"strong-0\",{\"children\":\"Create a Pull Request\"}]}],\"\\n\",[\"$\",\"p\",\"p-1\",{\"children\":\"Open a pull request from your fork to the main repository:\"}],\"\\n\",[\"$\",\"ul\",\"ul-0\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Provide a clear title and description\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Reference any related issues\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Describe testing performed\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Include screenshots for UI changes\"}],\"\\n\"]}],\"\\n\"]}],\"\\n\"]}]\n"])</script><script>self.__next_f.push([1,"2c1:[\"$\",\"h2\",\"h2-93\",{\"id\":\"code-of-conduct\",\"children\":\"Code of Conduct\"}]\n2c2:[\"$\",\"ul\",\"ul-25\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":\"Be respectful and inclusive in all interactions\"}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":\"Provide constructive feedback\"}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":\"Focus on the issues, not the people\"}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":\"Welcome contributors of all backgrounds and experience levels\"}],\"\\n\"]}]\n2c3:[\"$\",\"h2\",\"h2-94\",{\"id\":\"license-1\",\"children\":\"License\"}]\n2c4:[\"$\",\"p\",\"p-152\",{\"children\":\"By contributing to this project, you agree that your contributions will be licensed under the project's MIT License.\"}]\n2c5:[\"$\",\"hr\",\"hr-7\",{}]\n2c6:[\"$\",\"h1\",\"h1-16\",{\"id\":\"license-2\",\"children\":\"License\"}]\n2c7:[\"$\",\"h2\",\"h2-95\",{\"id\":\"mit-license\",\"children\":\"MIT License\"}]\n335:T424,Copyright (c) 2023 MCP Contributors\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n336:T424,Copyright (c) 2023 MCP Contributors\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n2c8:[\"$\",\"pre\",\"pre-180\",{\"className\":\"bg-secondary border border-border rounded-lg p-4 overflow-x-auto\",\"children\":[\"$\",\"code\",\"code-0\",{\"className\":\"$undefined\",\"node\":{\"type\":\"element\",\"tagName\":\"code\",\"properties\":{},\"children\":[{\"type\":\"text\",\"value\":\"$335\"}],\"position\":{\"start\":{\"line\":16543,\"column\":1,\"offset\":584907},\"end\":{\"line\":16563,\"column\":4,\"offset\":585974}}},\"children\":\"$336\"}]}]\n2c9:[\"$\",\"h2\",\"h2-96\",{\"id\":\"third-party-licenses\",\"children\":\"Third-Party Licenses\"}]\n2ca:[\"$\",\"p\",\"p-153\",{\"children\":\"This project incorporates several third-party open-source libraries, each with its own license:\"}]\n2cb:[\"$\",\"ul\",\"ul-26\",{\"children\":[\"\\n\",[\"$\",\"li\",\"li-0\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"FastAPI\"}],\": MIT License\"]}],\"\\n\",[\"$\",\"li\",\"li-1\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Pydantic\"}],\": MIT License\"]}],\"\\n\",[\"$\",\"li\",\"li-2\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Uvicorn\"}],\": BSD 3-Clause License\"]}],\"\\n\",[\"$\",\"li\",\"li-3\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"OpenAI Python\"}],\": MIT License\"]}],\"\\n\",[\"$\",\"li\",\"li-4\",{\"children\":[[\"$\",\"strong\",\"stro"])</script><script>self.__next_f.push([1,"ng-0\",{\"children\":\"Redis-py\"}],\": MIT License\"]}],\"\\n\",[\"$\",\"li\",\"li-5\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Prometheus Client\"}],\": Apache License 2.0\"]}],\"\\n\",[\"$\",\"li\",\"li-6\",{\"children\":[[\"$\",\"strong\",\"strong-0\",{\"children\":\"Ollama\"}],\": MIT License\"]}],\"\\n\"]}]\n2cc:[\"$\",\"p\",\"p-154\",{\"children\":\"Full license texts are included in the LICENSE-3RD-PARTY file in the repository.\"}]\n2cd:[\"$\",\"h2\",\"h2-97\",{\"id\":\"usage-restrictions\",\"children\":\"Usage Restrictions\"}]\n2ce:[\"$\",\"p\",\"p-155\",{\"children\":\"While the MCP system itself is open source, usage of the OpenAI API is subject to OpenAI's terms of service and usage policies. Please ensure your use of the API complies with these terms.\"}]\n2cf:[\"$\",\"$L337\",null,{\"currentSlug\":\"2025-03-12-integrating-openai-agents-sdk-ollama\",\"searchQuery\":\"OpenAI Agents SDK \u0026 Ollama Integration: Complete Architecture Guide This comprehensive guide demonstrates how to integrate the official OpenAI Agents SDK with Ollama to create AI agents that run entirely on local infrastructure. By the end, you'll understand both the theoretical foundations and practical implementation of locally-hosted AI agents.\"}]\n"])</script><script>self.__next_f.push([1,"2d0:[\"$\",\"aside\",null,{\"className\":\"hidden lg:block\",\"children\":[\"$\",\"div\",null,{\"className\":\"sticky top-24\",\"children\":[[\"$\",\"h3\",null,{\"className\":\"text-sm font-semibold text-foreground mb-4 uppercase tracking-wider\",\"children\":\"Table of Contents\"}],[\"$\",\"nav\",null,{\"className\":\"space-y-2\",\"children\":[[\"$\",\"a\",\"architectural-synthesis-integrating-openais-agents-sdk-with-ollama\",{\"href\":\"#architectural-synthesis-integrating-openais-agents-sdk-with-ollama\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Architectural Synthesis: Integrating OpenAI's Agents SDK with Ollama\"}],[\"$\",\"a\",\"a-convergence-of-contemporary-ai-paradigms\",{\"href\":\"#a-convergence-of-contemporary-ai-paradigms\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"A Convergence of Contemporary AI Paradigms\"}],[\"$\",\"a\",\"theoretical-framework-and-architectural-considerations\",{\"href\":\"#theoretical-framework-and-architectural-considerations\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Theoretical Framework and Architectural Considerations\"}],[\"$\",\"a\",\"functional-capabilities-and-implementation-vectors\",{\"href\":\"#functional-capabilities-and-implementation-vectors\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Functional Capabilities and Implementation Vectors\"}],[\"$\",\"a\",\"implementation-methodology\",{\"href\":\"#implementation-methodology\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Implementation Methodology\"}],[\"$\",\"a\",\"theoretical-implications-and-future-directions\",{\"href\":\"#theoretical-implications-and-future-directions\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Theoretical Implications and Future Directions\"}],[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}],[\"$\",\"a\",\"technical-infrastructure-establishing-the-development-environment-for-openai-ollama-integration\",{\"href\":\"#technical-infrastructure-establishing-the-development-environment-for-openai-ollama-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Technical Infrastructure: Establishing the Development Environment for OpenAI-Ollama Integration\"}],[\"$\",\"a\",\"foundational-dependencies-and-technological-requisites\",{\"href\":\"#foundational-dependencies-and-technological-requisites\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Foundational Dependencies and Technological Requisites\"}],[\"$\",\"a\",\"core-dependencies\",{\"href\":\"#core-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Core Dependencies\"}],[\"$\",\"a\",\"python-environment\",{\"href\":\"#python-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Python Environment\"}],[\"$\",\"a\",\"essential-python-packages\",{\"href\":\"#essential-python-packages\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Essential Python Packages\"}],[\"$\",\"a\",\"external-services\",{\"href\":\"#external-services\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"External Services\"}],[\"$\",\"a\",\"environment-configuration\",{\"href\":\"#environment-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Environment Configuration\"}],\"$L338\",\"$L339\",\"$L33a\",\"$L33b\",\"$L33c\",\"$L33d\",\"$L33e\",\"$L33f\",\"$L340\",\"$L341\",\"$L342\",\"$L343\",\"$L344\",\"$L345\",\"$L346\",\"$L347\",\"$L348\",\"$L349\",\"$L34a\",\"$L34b\",\"$L34c\",\"$L34d\",\"$L34e\",\"$L34f\",\"$L350\",\"$L351\",\"$L352\",\"$L353\",\"$L354\",\"$L355\",\"$L356\",\"$L357\",\"$L358\",\"$L359\",\"$L35a\",\"$L35b\",\"$L35c\",\"$L35d\",\"$L35e\",\"$L35f\",\"$L360\",\"$L361\",\"$L362\",\"$L363\",\"$L364\",\"$L365\",\"$L366\",\"$L367\",\"$L368\",\"$L369\",\"$L36a\",\"$L36b\",\"$L36c\",\"$L36d\",\"$L36e\",\"$L36f\",\"$L370\",\"$L371\",\"$L372\",\"$L373\",\"$L374\",\"$L375\",\"$L376\",\"$L377\",\"$L378\",\"$L379\",\"$L37a\",\"$L37b\",\"$L37c\",\"$L37d\",\"$L37e\",\"$L37f\",\"$L380\",\"$L381\",\"$L382\",\"$L383\",\"$L384\",\"$L385\",\"$L386\",\"$L387\",\"$L388\",\"$L389\",\"$L38a\",\"$L38b\",\"$L38c\",\"$L38d\",\"$L38e\",\"$L38f\",\"$L390\",\"$L391\",\"$L392\",\"$L393\",\"$L394\",\"$L395\",\"$L396\",\"$L397\",\"$L398\",\"$L399\",\"$L39a\",\"$L39b\",\"$L39c\",\"$L39d\",\"$L39e\",\"$L39f\",\"$L3a0\",\"$L3a1\",\"$L3a2\",\"$L3a3\",\"$L3a4\",\"$L3a5\",\"$L3a6\",\"$L3a7\",\"$L3a8\",\"$L3a9\",\"$L3aa\",\"$L3ab\",\"$L3ac\",\"$L3ad\",\"$L3ae\",\"$L3af\",\"$L3b0\",\"$L3b1\",\"$L3b2\",\"$L3b3\",\"$L3b4\",\"$L3b5\",\"$L3b6\",\"$L3b7\",\"$L3b8\",\"$L3b9\",\"$L3ba\",\"$L3bb\",\"$L3bc\",\"$L3bd\",\"$L3be\",\"$L3bf\",\"$L3c0\",\"$L3c1\",\"$L3c2\",\"$L3c3\",\"$L3c4\",\"$L3c5\",\"$L3c6\",\"$L3c7\",\"$L3c8\",\"$L3c9\",\"$L3ca\",\"$L3cb\",\"$L3cc\",\"$L3cd\",\"$L3ce\",\"$L3cf\",\"$L3d0\",\"$L3d1\",\"$L3d2\",\"$L3d3\",\"$L3d4\",\"$L3d5\",\"$L3d6\",\"$L3d7\",\"$L3d8\",\"$L3d9\",\"$L3da\",\"$L3db\",\"$L3dc\",\"$L3dd\",\"$L3de\",\"$L3df\",\"$L3e0\",\"$L3e1\",\"$L3e2\",\"$L3e3\",\"$L3e4\",\"$L3e5\",\"$L3e6\",\"$L3e7\",\"$L3e8\",\"$L3e9\",\"$L3ea\",\"$L3eb\",\"$L3ec\",\"$L3ed\",\"$L3ee\",\"$L3ef\",\"$L3f0\",\"$L3f1\",\"$L3f2\",\"$L3f3\",\"$L3f4\",\"$L3f5\",\"$L3f6\",\"$L3f7\",\"$L3f8\",\"$L3f9\",\"$L3fa\",\"$L3fb\",\"$L3fc\",\"$L3fd\",\"$L3fe\",\"$L3ff\",\"$L400\",\"$L401\",\"$L402\",\"$L403\",\"$L404\",\"$L405\",\"$L406\",\"$L407\",\"$L408\",\"$L409\",\"$L40a\",\"$L40b\",\"$L40c\",\"$L40d\",\"$L40e\",\"$L40f\",\"$L410\",\"$L411\",\"$L412\",\"$L413\",\"$L414\",\"$L415\",\"$L416\",\"$L417\",\"$L418\",\"$L419\",\"$L41a\",\"$L41b\",\"$L41c\",\"$L41d\",\"$L41e\",\"$L41f\",\"$L420\",\"$L421\",\"$L422\",\"$L423\",\"$L424\",\"$L425\",\"$L426\",\"$L427\",\"$L428\",\"$L429\",\"$L42a\",\"$L42b\",\"$L42c\",\"$L42d\",\"$L42e\",\"$L42f\",\"$L430\",\"$L431\",\"$L432\",\"$L433\",\"$L434\",\"$L435\",\"$L436\",\"$L437\",\"$L438\",\"$L439\",\"$L43a\",\"$L43b\",\"$L43c\",\"$L43d\",\"$L43e\",\"$L43f\",\"$L440\",\"$L441\",\"$L442\",\"$L443\",\"$L444\",\"$L445\",\"$L446\",\"$L447\",\"$L448\",\"$L449\",\"$L44a\",\"$L44b\",\"$L44c\",\"$L44d\",\"$L44e\",\"$L44f\",\"$L450\",\"$L451\",\"$L452\",\"$L453\",\"$L454\",\"$L455\",\"$L456\",\"$L457\",\"$L458\",\"$L459\",\"$L45a\",\"$L45b\",\"$L45c\",\"$L45d\",\"$L45e\",\"$L45f\",\"$L460\",\"$L461\",\"$L462\",\"$L463\",\"$L464\",\"$L465\",\"$L466\",\"$L467\",\"$L468\",\"$L469\",\"$L46a\",\"$L46b\",\"$L46c\",\"$L46d\",\"$L46e\",\"$L46f\",\"$L470\",\"$L471\",\"$L472\",\"$L473\",\"$L474\",\"$L475\",\"$L476\",\"$L477\",\"$L478\",\"$L479\",\"$L47a\",\"$L47b\",\"$L47c\",\"$L47d\",\"$L47e\",\"$L47f\",\"$L480\",\"$L481\",\"$L482\",\"$L483\",\"$L484\",\"$L485\",\"$L486\",\"$L487\",\"$L488\",\"$L489\",\"$L48a\",\"$L48b\",\"$L48c\",\"$L48d\",\"$L48e\",\"$L48f\",\"$L490\",\"$L491\",\"$L492\",\"$L493\",\"$L494\",\"$L495\",\"$L496\",\"$L497\",\"$L498\",\"$L499\",\"$L49a\",\"$L49b\",\"$L49c\",\"$L49d\",\"$L49e\",\"$L49f\",\"$L4a0\",\"$L4a1\",\"$L4a2\",\"$L4a3\",\"$L4a4\",\"$L4a5\",\"$L4a6\",\"$L4a7\",\"$L4a8\",\"$L4a9\",\"$L4aa\",\"$L4ab\",\"$L4ac\",\"$L4ad\",\"$L4ae\",\"$L4af\",\"$L4b0\",\"$L4b1\",\"$L4b2\",\"$L4b3\",\"$L4b4\",\"$L4b5\",\"$L4b6\",\"$L4b7\",\"$L4b8\",\"$L4b9\",\"$L4ba\",\"$L4bb\",\"$L4bc\",\"$L4bd\",\"$L4be\",\"$L4bf\",\"$L4c0\",\"$L4c1\",\"$L4c2\",\"$L4c3\",\"$L4c4\",\"$L4c5\",\"$L4c6\",\"$L4c7\",\"$L4c8\",\"$L4c9\",\"$L4ca\",\"$L4cb\",\"$L4cc\",\"$L4cd\",\"$L4ce\",\"$L4cf\",\"$L4d0\",\"$L4d1\",\"$L4d2\",\"$L4d3\",\"$L4d4\",\"$L4d5\",\"$L4d6\",\"$L4d7\",\"$L4d8\",\"$L4d9\",\"$L4da\",\"$L4db\",\"$L4dc\",\"$L4dd\",\"$L4de\",\"$L4df\",\"$L4e0\",\"$L4e1\",\"$L4e2\",\"$L4e3\",\"$L4e4\",\"$L4e5\",\"$L4e6\",\"$L4e7\",\"$L4e8\",\"$L4e9\",\"$L4ea\",\"$L4eb\",\"$L4ec\",\"$L4ed\",\"$L4ee\",\"$L4ef\",\"$L4f0\",\"$L4f1\",\"$L4f2\",\"$L4f3\",\"$L4f4\",\"$L4f5\",\"$L4f6\",\"$L4f7\",\"$L4f8\",\"$L4f9\",\"$L4fa\",\"$L4fb\",\"$L4fc\",\"$L4fd\",\"$L4fe\",\"$L4ff\",\"$L500\",\"$L501\",\"$L502\",\"$L503\",\"$L504\",\"$L505\",\"$L506\",\"$L507\",\"$L508\",\"$L509\",\"$L50a\",\"$L50b\",\"$L50c\",\"$L50d\",\"$L50e\",\"$L50f\",\"$L510\",\"$L511\",\"$L512\",\"$L513\",\"$L514\",\"$L515\",\"$L516\",\"$L517\",\"$L518\",\"$L519\",\"$L51a\",\"$L51b\",\"$L51c\",\"$L51d\",\"$L51e\",\"$L51f\",\"$L520\",\"$L521\",\"$L522\",\"$L523\",\"$L524\",\"$L525\",\"$L526\",\"$L527\",\"$L528\"]}]]}]}]\n"])</script><script>self.__next_f.push([1,"2d1:[\"$\",\"$L529\",null,{}]\n"])</script><script>self.__next_f.push([1,"32a:[\"$\",\"li\",\"li-10\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#contributing\",\"children\":\"Contributing\"}]}]\n32b:[\"$\",\"li\",\"li-11\",{\"children\":[\"$\",\"a\",\"a-0\",{\"href\":\"#license\",\"children\":\"License\"}]}]\n338:[\"$\",\"a\",\"installation-procedure\",{\"href\":\"#installation-procedure\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Installation Procedure\"}]\n339:[\"$\",\"a\",\"environment-configuration\",{\"href\":\"#environment-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Environment Configuration\"}]\n33a:[\"$\",\"a\",\"openai-configuration\",{\"href\":\"#openai-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"OpenAI Configuration\"}]\n33b:[\"$\",\"a\",\"model-configuration\",{\"href\":\"#model-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Model Configuration\"}]\n33c:[\"$\",\"a\",\"system-behavior\",{\"href\":\"#system-behavior\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"System Behavior\"}]\n33d:[\"$\",\"a\",\"routing-configuration\",{\"href\":\"#routing-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Routing Configuration\"}]\n33e:[\"$\",\"a\",\"logging-configuration\",{\"href\":\"#logging-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Logging Configuration\"}]\n33f:[\"$\",\"a\",\"development-environment-setup\",{\"href\":\"#development-environment-setup\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Development Environment Setup\"}]\n340:[\"$\",\"a\",\"repository-initialization\",{\"href\":\"#repository-initialization\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Repository Initialization\"}]\n341:[\"$\",\"a\",\"project-structure-implementation\",{\"href\":\"#project-structure-implementation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Project Structure Implementation\"}]\n342:[\"$\",\"a\",\"local-development-server\",{\"href\":\"#local-development-server\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Local Development Server\"}]\n343:[\"$\",\"a\",\"start-ollama-service\",{\"href\":\"#start-ollama-service\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Start Ollama service\"}]\n344:[\"$\",\"a\",\"in-a-separate-terminal-start-the-application\",{\"href\":\"#in-a-separate-terminal-start-the-application\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"In a separate terminal, start the application\"}]\n345:[\"$\",\"a\",\"containerization-optional\",{\"href\":\"#containerization-optional\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Containerization (Optional)\"}]\n346:[\"$\",\"a\",\"dockerfile\",{\"href\":\"#dockerfile\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Dockerfile\"}]\n347:[\"$\",\"a\",\"docker-composeyml\",{\"href\":\"#docker-composeyml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"docker-compose.yml\"}]\n348:[\"$\",\"a\",\"verification-of-installation\",{\"href\":\"#verification-of-installation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Verification of Installation\"}]\n349:[\"$\",\"a\",\"integration-architecture-openai-responses-api-within-the-mcp-framework\",{\"href\":\"#integration-architecture-openai-responses-api-within-the-mcp-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Integration Architecture: OpenAI Responses API within the MCP Framework\"}]\n34a:[\"$\",\"a\",\"theoretical-framew"])</script><script>self.__next_f.push([1,"ork-for-api-integration\",{\"href\":\"#theoretical-framework-for-api-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Theoretical Framework for API Integration\"}]\n34b:[\"$\",\"a\",\"api-architectural-design\",{\"href\":\"#api-architectural-design\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"API Architectural Design\"}]\n34c:[\"$\",\"a\",\"core-endpoints-structure\",{\"href\":\"#core-endpoints-structure\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Core Endpoints Structure\"}]\n34d:[\"$\",\"a\",\"requestresponse-schemata\",{\"href\":\"#requestresponse-schemata\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Request/Response Schemata\"}]\n34e:[\"$\",\"a\",\"authentication-security-framework\",{\"href\":\"#authentication-security-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Authentication \u0026 Security Framework\"}]\n34f:[\"$\",\"a\",\"authentication-mechanisms\",{\"href\":\"#authentication-mechanisms\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Authentication Mechanisms\"}]\n350:[\"$\",\"a\",\"security-considerations\",{\"href\":\"#security-considerations\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Security Considerations\"}]\n351:[\"$\",\"a\",\"error-handling-architecture\",{\"href\":\"#error-handling-architecture\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Error Handling Architecture\"}]\n352:[\"$\",\"a\",\"error-categories\",{\"href\":\"#error-categories\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Error Categories\"}]\n353:[\"$\",\"a\",\"rate-limiting-architecture\",{\"href\":\"#rate-limiting-architecture\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Rate Limiting Architecture\"}]\n354:[\"$\",\"a\",\"tiered-rate-limiting\",{\"href\":\"#tiered-rate-limiting\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Tiered Rate Limiting\"}]\n355:[\"$\",\"a\",\"dynamic-rate-adjustment\",{\"href\":\"#dynamic-rate-adjustment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Dynamic Rate Adjustment\"}]\n356:[\"$\",\"a\",\"rate-limit-response\",{\"href\":\"#rate-limit-response\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Rate Limit Response\"}]\n357:[\"$\",\"a\",\"implementation-strategy\",{\"href\":\"#implementation-strategy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Implementation Strategy\"}]\n358:[\"$\",\"a\",\"provider-abstraction-layer\",{\"href\":\"#provider-abstraction-layer\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Provider Abstraction Layer\"}]\n359:[\"$\",\"a\",\"pseudocode-for-the-provider-abstraction-layer\",{\"href\":\"#pseudocode-for-the-provider-abstraction-layer\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Pseudocode for the Provider Abstraction Layer\"}]\n35a:[\"$\",\"a\",\"intelligent-routing-decision-engine\",{\"href\":\"#intelligent-routing-decision-engine\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Intelligent Routing Decision Engine\"}]\n35b:[\"$\",\"a\",\"pseudocode-for-routing-logic\",{\"href\":\"#pseudocode-for-routing-logic\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Pseudocode for Routing Logic\"}]\n35c:[\"$\",\"a\",\"authentication-implementation\",{\"href\":\"#authentication-implementation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"chi"])</script><script>self.__next_f.push([1,"ldren\":\"Authentication Implementation\"}]\n35d:[\"$\",\"a\",\"middleware-for-api-key-authentication\",{\"href\":\"#middleware-for-api-key-authentication\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Middleware for API Key Authentication\"}]\n35e:[\"$\",\"a\",\"rate-limiting-implementation\",{\"href\":\"#rate-limiting-implementation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Rate Limiting Implementation\"}]\n35f:[\"$\",\"a\",\"rate-limiter-implementation\",{\"href\":\"#rate-limiter-implementation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Rate Limiter Implementation\"}]\n360:[\"$\",\"a\",\"operational-considerations\",{\"href\":\"#operational-considerations\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Operational Considerations\"}]\n361:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}]\n362:[\"$\",\"a\",\"autonomous-agent-architecture-python-implementations-for-mcp-integration\",{\"href\":\"#autonomous-agent-architecture-python-implementations-for-mcp-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Autonomous Agent Architecture: Python Implementations for MCP Integration\"}]\n363:[\"$\",\"a\",\"theoretical-framework-for-agent-design\",{\"href\":\"#theoretical-framework-for-agent-design\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Theoretical Framework for Agent Design\"}]\n364:[\"$\",\"a\",\"core-agent-infrastructure\",{\"href\":\"#core-agent-infrastructure\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Core Agent Infrastructure\"}]\n365:[\"$\",\"a\",\"base-agent-class\",{\"href\":\"#base-agent-class\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Base Agent Class\"}]\n366:[\"$\",\"a\",\"appagentsbaseagentpy\",{\"href\":\"#appagentsbaseagentpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/base_agent.py\"}]\n367:[\"$\",\"a\",\"specialized-agent-implementations\",{\"href\":\"#specialized-agent-implementations\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Specialized Agent Implementations\"}]\n368:[\"$\",\"a\",\"research-agent-with-knowledge-retrieval\",{\"href\":\"#research-agent-with-knowledge-retrieval\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Research Agent with Knowledge Retrieval\"}]\n369:[\"$\",\"a\",\"appagentsresearchagentpy\",{\"href\":\"#appagentsresearchagentpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/research_agent.py\"}]\n36a:[\"$\",\"a\",\"conversational-flow-manager-agent\",{\"href\":\"#conversational-flow-manager-agent\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Conversational Flow Manager Agent\"}]\n36b:[\"$\",\"a\",\"appagentsconversationmanagerpy\",{\"href\":\"#appagentsconversationmanagerpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/conversation_manager.py\"}]\n36c:[\"$\",\"a\",\"memory-enhanced-contextual-agent\",{\"href\":\"#memory-enhanced-contextual-agent\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Memory-Enhanced Contextual Agent\"}]\n36d:[\"$\",\"a\",\"appagentscontextualagentpy\",{\"href\":\"#appagentscontextualagentpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/contextual_agent.py\"}]\n36e:[\"$\",\"a\",\"advanced-tool-integration\",{\"href\":\"#advanced-tool-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transi"])</script><script>self.__next_f.push([1,"tion-colors pl-0\",\"children\":\"Advanced Tool Integration\"}]\n36f:[\"$\",\"a\",\"collaborative-task-management-agent\",{\"href\":\"#collaborative-task-management-agent\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Collaborative Task Management Agent\"}]\n370:[\"$\",\"a\",\"appagentstaskagentpy\",{\"href\":\"#appagentstaskagentpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/task_agent.py\"}]\n371:[\"$\",\"a\",\"agent-factory-and-orchestration\",{\"href\":\"#agent-factory-and-orchestration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Agent Factory and Orchestration\"}]\n372:[\"$\",\"a\",\"appagentsagentfactorypy\",{\"href\":\"#appagentsagentfactorypy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/agent_factory.py\"}]\n373:[\"$\",\"a\",\"metaframework-for-agent-composition\",{\"href\":\"#metaframework-for-agent-composition\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Metaframework for Agent Composition\"}]\n374:[\"$\",\"a\",\"appagentsmetaagentpy\",{\"href\":\"#appagentsmetaagentpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/meta_agent.py\"}]\n375:[\"$\",\"a\",\"sample-agent-usage-implementation\",{\"href\":\"#sample-agent-usage-implementation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Sample Agent Usage Implementation\"}]\n376:[\"$\",\"a\",\"appmainpy\",{\"href\":\"#appmainpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/main.py\"}]\n377:[\"$\",\"a\",\"configure-logging\",{\"href\":\"#configure-logging\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Configure logging\"}]\n378:[\"$\",\"a\",\"initialize-services\",{\"href\":\"#initialize-services\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize services\"}]\n379:[\"$\",\"a\",\"initialize-agent-factory\",{\"href\":\"#initialize-agent-factory\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize agent factory\"}]\n37a:[\"$\",\"a\",\"agent-session-storage\",{\"href\":\"#agent-session-storage\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Agent session storage\"}]\n37b:[\"$\",\"a\",\"define-requestresponse-models\",{\"href\":\"#define-requestresponse-models\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Define request/response models\"}]\n37c:[\"$\",\"a\",\"auth-dependency\",{\"href\":\"#auth-dependency\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Auth dependency\"}]\n37d:[\"$\",\"a\",\"routes\",{\"href\":\"#routes\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Routes\"}]\n37e:[\"$\",\"a\",\"startup-event\",{\"href\":\"#startup-event\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Startup event\"}]\n37f:[\"$\",\"a\",\"shutdown-event\",{\"href\":\"#shutdown-event\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Shutdown event\"}]\n380:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}]\n381:[\"$\",\"a\",\"hybrid-intelligence-architecture-integrating-ollama-with-openais-agent-sdk\",{\"href\":\"#hybrid-intelligence-architecture-integrating-ollama-with-openais-agent-sdk\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Hybrid Intelligence Architecture: Integrating Ollama with OpenAI's Agent SDK\"}]\n382:[\"$\",\"a\",\"theoretical-framework-for-hybrid-model-inference\",{\"href\":\"#theoretical-frame"])</script><script>self.__next_f.push([1,"work-for-hybrid-model-inference\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Theoretical Framework for Hybrid Model Inference\"}]\n383:[\"$\",\"a\",\"ollama-integration-architecture\",{\"href\":\"#ollama-integration-architecture\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Ollama Integration Architecture\"}]\n384:[\"$\",\"a\",\"core-integration-components\",{\"href\":\"#core-integration-components\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Core Integration Components\"}]\n385:[\"$\",\"a\",\"appservicesollamaservicepy\",{\"href\":\"#appservicesollamaservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/ollama_service.py\"}]\n386:[\"$\",\"a\",\"provider-selection-service\",{\"href\":\"#provider-selection-service\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Provider Selection Service\"}]\n387:[\"$\",\"a\",\"appservicesproviderservicepy\",{\"href\":\"#appservicesproviderservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/provider_service.py\"}]\n388:[\"$\",\"a\",\"configuration-settings\",{\"href\":\"#configuration-settings\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Configuration Settings\"}]\n389:[\"$\",\"a\",\"appconfigpy\",{\"href\":\"#appconfigpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/config.py\"}]\n38a:[\"$\",\"a\",\"load-environment-variables-from-env-file\",{\"href\":\"#load-environment-variables-from-env-file\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Load environment variables from .env file\"}]\n38b:[\"$\",\"a\",\"model-selection-and-configuration\",{\"href\":\"#model-selection-and-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Model Selection and Configuration\"}]\n38c:[\"$\",\"a\",\"appmodelsmodelcatalogpy\",{\"href\":\"#appmodelsmodelcatalogpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/models/model_catalog.py\"}]\n38d:[\"$\",\"a\",\"ollama-model-catalog\",{\"href\":\"#ollama-model-catalog\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Ollama model catalog\"}]\n38e:[\"$\",\"a\",\"openai-ollama-model-mapping-for-fallback-scenarios\",{\"href\":\"#openai-ollama-model-mapping-for-fallback-scenarios\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"OpenAI -\u003e Ollama model mapping for fallback scenarios\"}]\n38f:[\"$\",\"a\",\"use-case-to-model-recommendations\",{\"href\":\"#use-case-to-model-recommendations\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Use case to model recommendations\"}]\n390:[\"$\",\"a\",\"agent-adapter-for-model-selection\",{\"href\":\"#agent-adapter-for-model-selection\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Agent Adapter for Model Selection\"}]\n391:[\"$\",\"a\",\"appagentsadaptiveagentpy\",{\"href\":\"#appagentsadaptiveagentpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/agents/adaptive_agent.py\"}]\n392:[\"$\",\"a\",\"agent-controller-with-model-selection\",{\"href\":\"#agent-controller-with-model-selection\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Agent Controller with Model Selection\"}]\n393:[\"$\",\"a\",\"appcontrollersagentcontrollerpy\",{\"href\":\"#appcontrollersagentcontrollerpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/controllers/agent_controller.py\"}]\n394:[\"$\",\"a\",\"agent-sessions-storage\",{\"href\":\"#agent-sessions-storage"])</script><script>self.__next_f.push([1,"\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Agent sessions storage\"}]\n395:[\"$\",\"a\",\"get-agent-factory-instance\",{\"href\":\"#get-agent-factory-instance\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Get agent factory instance\"}]\n396:[\"$\",\"a\",\"dockerfile-for-local-deployment\",{\"href\":\"#dockerfile-for-local-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Dockerfile for Local Deployment\"}]\n397:[\"$\",\"a\",\"dockerfile\",{\"href\":\"#dockerfile\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Dockerfile\"}]\n398:[\"$\",\"a\",\"install-system-dependencies\",{\"href\":\"#install-system-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install system dependencies\"}]\n399:[\"$\",\"a\",\"copy-requirements\",{\"href\":\"#copy-requirements\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy requirements\"}]\n39a:[\"$\",\"a\",\"copy-application-code\",{\"href\":\"#copy-application-code\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy application code\"}]\n39b:[\"$\",\"a\",\"set-up-environment\",{\"href\":\"#set-up-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set up environment\"}]\n39c:[\"$\",\"a\",\"default-command\",{\"href\":\"#default-command\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Default command\"}]\n39d:[\"$\",\"a\",\"docker-compose-for-development\",{\"href\":\"#docker-compose-for-development\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Docker Compose for Development\"}]\n39e:[\"$\",\"a\",\"docker-composeyml\",{\"href\":\"#docker-composeyml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"docker-compose.yml\"}]\n39f:[\"$\",\"a\",\"model-preload-script\",{\"href\":\"#model-preload-script\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Model Preload Script\"}]\n3a0:[\"$\",\"a\",\"scriptspreloadmodelspy\",{\"href\":\"#scriptspreloadmodelspy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"scripts/preload_models.py\"}]\n3a1:[\"$\",\"a\",\"implementation-guide\",{\"href\":\"#implementation-guide\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Implementation Guide\"}]\n3a2:[\"$\",\"a\",\"setting-up-ollama\",{\"href\":\"#setting-up-ollama\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Setting up Ollama\"}]\n3a3:[\"$\",\"a\",\"application-configuration\",{\"href\":\"#application-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Application Configuration\"}]\n3a4:[\"$\",\"a\",\"model-selection-criteria\",{\"href\":\"#model-selection-criteria\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Model Selection Criteria\"}]\n3a5:[\"$\",\"a\",\"ollama-model-selection\",{\"href\":\"#ollama-model-selection\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Ollama Model Selection\"}]\n3a6:[\"$\",\"a\",\"performance-optimization\",{\"href\":\"#performance-optimization\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Performance Optimization\"}]\n3a7:[\"$\",\"a\",\"fallback-mechanisms\",{\"href\":\"#fallback-mechanisms\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Fallback Mechanisms\"}]\n3a8:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0"])</script><script>self.__next_f.push([1,"\",\"children\":\"Conclusion\"}]\n3a9:[\"$\",\"a\",\"comprehensive-testing-strategy-for-openai-ollama-hybrid-agent-system\",{\"href\":\"#comprehensive-testing-strategy-for-openai-ollama-hybrid-agent-system\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Comprehensive Testing Strategy for OpenAI-Ollama Hybrid Agent System\"}]\n3aa:[\"$\",\"a\",\"theoretical-framework-for-validation-methodology\",{\"href\":\"#theoretical-framework-for-validation-methodology\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Theoretical Framework for Validation Methodology\"}]\n3ab:[\"$\",\"a\",\"strategic-testing-layers\",{\"href\":\"#strategic-testing-layers\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Strategic Testing Layers\"}]\n3ac:[\"$\",\"a\",\"1-unit-testing-framework\",{\"href\":\"#1-unit-testing-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Unit Testing Framework\"}]\n3ad:[\"$\",\"a\",\"testsunittestproviderservicepy\",{\"href\":\"#testsunittestproviderservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/unit/test_provider_service.py\"}]\n3ae:[\"$\",\"a\",\"testsunittestmodelselectionpy\",{\"href\":\"#testsunittestmodelselectionpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/unit/test_model_selection.py\"}]\n3af:[\"$\",\"a\",\"testsunittestollamaservicepy\",{\"href\":\"#testsunittestollamaservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/unit/test_ollama_service.py\"}]\n3b0:[\"$\",\"a\",\"testsunittesttoolintegrationpy\",{\"href\":\"#testsunittesttoolintegrationpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/unit/test_tool_integration.py\"}]\n3b1:[\"$\",\"a\",\"2-integration-testing-framework\",{\"href\":\"#2-integration-testing-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Integration Testing Framework\"}]\n3b2:[\"$\",\"a\",\"testsintegrationtestapiendpointspy\",{\"href\":\"#testsintegrationtestapiendpointspy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/integration/test_api_endpoints.py\"}]\n3b3:[\"$\",\"a\",\"testsintegrationtestagentflowspy\",{\"href\":\"#testsintegrationtestagentflowspy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/integration/test_agent_flows.py\"}]\n3b4:[\"$\",\"a\",\"testsintegrationtestcrossproviderpy\",{\"href\":\"#testsintegrationtestcrossproviderpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/integration/test_cross_provider.py\"}]\n3b5:[\"$\",\"a\",\"3-performance-testing-framework\",{\"href\":\"#3-performance-testing-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Performance Testing Framework\"}]\n3b6:[\"$\",\"a\",\"testsperformancetestlatencypy\",{\"href\":\"#testsperformancetestlatencypy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/performance/test_latency.py\"}]\n3b7:[\"$\",\"a\",\"skip-tests-if-its-ci-environment\",{\"href\":\"#skip-tests-if-its-ci-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Skip tests if it's CI environment\"}]\n3b8:[\"$\",\"a\",\"testsperformancetestmemoryusagepy\",{\"href\":\"#testsperformancetestmemoryusagepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/performance/test_memory_usage.py\"}]\n3b9:[\"$\",\"a\",\"skip-tests-if-its-ci-environment\",{\"href\":\"#skip-tests-if-its-ci-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Skip tests if it's CI environme"])</script><script>self.__next_f.push([1,"nt\"}]\n3ba:[\"$\",\"a\",\"testsperformancetestresponsequalitypy\",{\"href\":\"#testsperformancetestresponsequalitypy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/performance/test_response_quality.py\"}]\n3bb:[\"$\",\"a\",\"skip-tests-if-its-ci-environment\",{\"href\":\"#skip-tests-if-its-ci-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Skip tests if it's CI environment\"}]\n3bc:[\"$\",\"a\",\"4-reliability-testing-framework\",{\"href\":\"#4-reliability-testing-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Reliability Testing Framework\"}]\n3bd:[\"$\",\"a\",\"testsreliabilitytesterrorhandlingpy\",{\"href\":\"#testsreliabilitytesterrorhandlingpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/reliability/test_error_handling.py\"}]\n3be:[\"$\",\"a\",\"testsreliabilitytestloadpy\",{\"href\":\"#testsreliabilitytestloadpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/reliability/test_load.py\"}]\n3bf:[\"$\",\"a\",\"skip-tests-if-its-ci-environment\",{\"href\":\"#skip-tests-if-its-ci-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Skip tests if it's CI environment\"}]\n3c0:[\"$\",\"a\",\"testsreliabilityteststabilitypy\",{\"href\":\"#testsreliabilityteststabilitypy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/reliability/test_stability.py\"}]\n3c1:[\"$\",\"a\",\"skip-tests-if-its-ci-environment\",{\"href\":\"#skip-tests-if-its-ci-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Skip tests if it's CI environment\"}]\n3c2:[\"$\",\"a\",\"automation-framework\",{\"href\":\"#automation-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Automation Framework\"}]\n3c3:[\"$\",\"a\",\"test-orchestration-script\",{\"href\":\"#test-orchestration-script\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Test Orchestration Script\"}]\n3c4:[\"$\",\"a\",\"scriptsruntestspy\",{\"href\":\"#scriptsruntestspy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"scripts/run_tests.py\"}]\n3c5:[\"$\",\"a\",\"cicd-configuration\",{\"href\":\"#cicd-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"CI/CD Configuration\"}]\n3c6:[\"$\",\"a\",\"githubworkflowstestyml\",{\"href\":\"#githubworkflowstestyml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\".github/workflows/test.yml\"}]\n3c7:[\"$\",\"a\",\"comparative-benchmark-framework\",{\"href\":\"#comparative-benchmark-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Comparative Benchmark Framework\"}]\n3c8:[\"$\",\"a\",\"response-quality-evaluation-matrix\",{\"href\":\"#response-quality-evaluation-matrix\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Response Quality Evaluation Matrix\"}]\n3c9:[\"$\",\"a\",\"testsbenchmarksqualitymatrixpy\",{\"href\":\"#testsbenchmarksqualitymatrixpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/benchmarks/quality_matrix.py\"}]\n3ca:[\"$\",\"a\",\"test-questions-across-multiple-domains\",{\"href\":\"#test-questions-across-multiple-domains\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Test questions across multiple domains\"}]\n3cb:[\"$\",\"a\",\"latency-and-cost-efficiency-analysis\",{\"href\":\"#latency-and-cost-efficiency-analysis\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Latency and Cost Efficiency Analysis\"}]\n3cc:[\"$\",\"a\",\"tes"])</script><script>self.__next_f.push([1,"tsbenchmarksefficiencyanalysispy\",{\"href\":\"#testsbenchmarksefficiencyanalysispy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/benchmarks/efficiency_analysis.py\"}]\n3cd:[\"$\",\"a\",\"test-prompts-of-different-lengths\",{\"href\":\"#test-prompts-of-different-lengths\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Test prompts of different lengths\"}]\n3ce:[\"$\",\"a\",\"tool-usage-comparison\",{\"href\":\"#tool-usage-comparison\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Tool Usage Comparison\"}]\n3cf:[\"$\",\"a\",\"testsbenchmarkstoolusagecomparisonpy\",{\"href\":\"#testsbenchmarkstoolusagecomparisonpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"tests/benchmarks/tool_usage_comparison.py\"}]\n3d0:[\"$\",\"a\",\"test-tools-for-benchmarking\",{\"href\":\"#test-tools-for-benchmarking\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Test tools for benchmarking\"}]\n3d1:[\"$\",\"a\",\"tool-usage-queries\",{\"href\":\"#tool-usage-queries\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Tool usage queries\"}]\n3d2:[\"$\",\"a\",\"pytest-configuration\",{\"href\":\"#pytest-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Pytest Configuration\"}]\n3d3:[\"$\",\"a\",\"pytestini\",{\"href\":\"#pytestini\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"pytest.ini\"}]\n3d4:[\"$\",\"a\",\"dont-run-performance-tests-by-default\",{\"href\":\"#dont-run-performance-tests-by-default\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Don't run performance tests by default\"}]\n3d5:[\"$\",\"a\",\"configure-test-outputs\",{\"href\":\"#configure-test-outputs\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Configure test outputs\"}]\n3d6:[\"$\",\"a\",\"add-environment-variables-for-default-runs\",{\"href\":\"#add-environment-variables-for-default-runs\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Add environment variables for default runs\"}]\n3d7:[\"$\",\"a\",\"test-documentation\",{\"href\":\"#test-documentation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Test Documentation\"}]\n3d8:[\"$\",\"a\",\"testing-strategy-for-openai-ollama-integration\",{\"href\":\"#testing-strategy-for-openai-ollama-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Testing Strategy for OpenAI-Ollama Integration\"}]\n3d9:[\"$\",\"a\",\"1-unit-testing\",{\"href\":\"#1-unit-testing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"1. Unit Testing\"}]\n3da:[\"$\",\"a\",\"2-integration-testing\",{\"href\":\"#2-integration-testing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"2. Integration Testing\"}]\n3db:[\"$\",\"a\",\"3-performance-testing\",{\"href\":\"#3-performance-testing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"3. Performance Testing\"}]\n3dc:[\"$\",\"a\",\"4-reliability-testing\",{\"href\":\"#4-reliability-testing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"4. Reliability Testing\"}]\n3dd:[\"$\",\"a\",\"5-benchmark-framework\",{\"href\":\"#5-benchmark-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"5. Benchmark Framework\"}]\n3de:[\"$\",\"a\",\"running-the-complete-test-suite\",{\"href\":\"#running-the-complete-test-suite\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Running the Complete Test Suite\"}]\n3df:[\"$\",\"a\",\"cic"])</script><script>self.__next_f.push([1,"d-integration\",{\"href\":\"#cicd-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"CI/CD Integration\"}]\n3e0:[\"$\",\"a\",\"triggered-on-push-to-maindevelop-or-manually-via-workflowdispatch\",{\"href\":\"#triggered-on-push-to-maindevelop-or-manually-via-workflowdispatch\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Triggered on push to main/develop or manually via workflow_dispatch\"}]\n3e1:[\"$\",\"a\",\"prerequisites\",{\"href\":\"#prerequisites\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Prerequisites\"}]\n3e2:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}]\n3e3:[\"$\",\"a\",\"user-interface-design-for-hybrid-openai-ollama-mcp-system\",{\"href\":\"#user-interface-design-for-hybrid-openai-ollama-mcp-system\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"User Interface Design for Hybrid OpenAI-Ollama MCP System\"}]\n3e4:[\"$\",\"a\",\"conceptual-framework-for-interface-design\",{\"href\":\"#conceptual-framework-for-interface-design\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conceptual Framework for Interface Design\"}]\n3e5:[\"$\",\"a\",\"command-line-interface-cli-design\",{\"href\":\"#command-line-interface-cli-design\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Command Line Interface (CLI) Design\"}]\n3e6:[\"$\",\"a\",\"cli-architecture\",{\"href\":\"#cli-architecture\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"CLI Architecture\"}]\n3e7:[\"$\",\"a\",\"cli-wireframes\",{\"href\":\"#cli-wireframes\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"CLI Wireframes\"}]\n3e8:[\"$\",\"a\",\"cli-interaction-flow\",{\"href\":\"#cli-interaction-flow\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"CLI Interaction Flow\"}]\n3e9:[\"$\",\"a\",\"cli-implementation-example\",{\"href\":\"#cli-implementation-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"CLI Implementation Example\"}]\n3ea:[\"$\",\"a\",\"mcpclipy\",{\"href\":\"#mcpclipy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"mcp_cli.py\"}]\n3eb:[\"$\",\"a\",\"initialize-colorama-for-cross-platform-color-support\",{\"href\":\"#initialize-colorama-for-cross-platform-color-support\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize colorama for cross-platform color support\"}]\n3ec:[\"$\",\"a\",\"web-interface-design\",{\"href\":\"#web-interface-design\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Web Interface Design\"}]\n3ed:[\"$\",\"a\",\"web-interface-architecture\",{\"href\":\"#web-interface-architecture\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Web Interface Architecture\"}]\n3ee:[\"$\",\"a\",\"web-interface-wireframes\",{\"href\":\"#web-interface-wireframes\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Web Interface Wireframes\"}]\n3ef:[\"$\",\"a\",\"web-interface-interaction-flow\",{\"href\":\"#web-interface-interaction-flow\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Web Interface Interaction Flow\"}]\n3f0:[\"$\",\"a\",\"key-web-components\",{\"href\":\"#key-web-components\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Key Web Components\"}]\n3f1:[\"$\",\"a\",\"user-interaction-flows\",{\"href\":\"#user-interaction-flows\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition"])</script><script>self.__next_f.push([1,"-colors pl-0\",\"children\":\"User Interaction Flows\"}]\n3f2:[\"$\",\"a\",\"new-user-onboarding-flow\",{\"href\":\"#new-user-onboarding-flow\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"New User Onboarding Flow\"}]\n3f3:[\"$\",\"a\",\"task-based-user-flow-example\",{\"href\":\"#task-based-user-flow-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Task-Based User Flow Example\"}]\n3f4:[\"$\",\"a\",\"advanced-settings-flow\",{\"href\":\"#advanced-settings-flow\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Advanced Settings Flow\"}]\n3f5:[\"$\",\"a\",\"implementation-recommendations\",{\"href\":\"#implementation-recommendations\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Implementation Recommendations\"}]\n3f6:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}]\n3f7:[\"$\",\"a\",\"optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system\",{\"href\":\"#optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System\"}]\n3f8:[\"$\",\"a\",\"strategic-optimization-framework\",{\"href\":\"#strategic-optimization-framework\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Strategic Optimization Framework\"}]\n3f9:[\"$\",\"a\",\"performance-optimization-strategies\",{\"href\":\"#performance-optimization-strategies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Performance Optimization Strategies\"}]\n3fa:[\"$\",\"a\",\"1-query-routing-optimization\",{\"href\":\"#1-query-routing-optimization\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Query Routing Optimization\"}]\n3fb:[\"$\",\"a\",\"appservicesroutingoptimizerpy\",{\"href\":\"#appservicesroutingoptimizerpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/routing_optimizer.py\"}]\n3fc:[\"$\",\"a\",\"2-response-caching-with-semantic-search\",{\"href\":\"#2-response-caching-with-semantic-search\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Response Caching with Semantic Search\"}]\n3fd:[\"$\",\"a\",\"appservicescacheservicepy\",{\"href\":\"#appservicescacheservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/cache_service.py\"}]\n3fe:[\"$\",\"a\",\"3-parallel-query-processing\",{\"href\":\"#3-parallel-query-processing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Parallel Query Processing\"}]\n3ff:[\"$\",\"a\",\"appservicesparallelprocessorpy\",{\"href\":\"#appservicesparallelprocessorpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/parallel_processor.py\"}]\n400:[\"$\",\"a\",\"4-dynamic-batching-for-high-load-scenarios\",{\"href\":\"#4-dynamic-batching-for-high-load-scenarios\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Dynamic Batching for High-Load Scenarios\"}]\n401:[\"$\",\"a\",\"appservicesbatchprocessorpy\",{\"href\":\"#appservicesbatchprocessorpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/batch_processor.py\"}]\n402:[\"$\",\"a\",\"5-model-specific-prompt-optimization\",{\"href\":\"#5-model-specific-prompt-optimization\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"5. Model-Specific Prompt Optimization\"}]\n403:[\"$\",\"a\",\"appservicespromptoptimizerpy\",{\"href\":\"#appservicesprom"])</script><script>self.__next_f.push([1,"ptoptimizerpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/prompt_optimizer.py\"}]\n404:[\"$\",\"a\",\"cost-reduction-strategies\",{\"href\":\"#cost-reduction-strategies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Cost Reduction Strategies\"}]\n405:[\"$\",\"a\",\"1-token-usage-optimization\",{\"href\":\"#1-token-usage-optimization\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Token Usage Optimization\"}]\n406:[\"$\",\"a\",\"appservicestokenoptimizerpy\",{\"href\":\"#appservicestokenoptimizerpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/token_optimizer.py\"}]\n407:[\"$\",\"a\",\"2-model-tier-selection\",{\"href\":\"#2-model-tier-selection\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Model Tier Selection\"}]\n408:[\"$\",\"a\",\"appservicesmodeltierservicepy\",{\"href\":\"#appservicesmodeltierservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/model_tier_service.py\"}]\n409:[\"$\",\"a\",\"3-local-model-prioritization-for-development\",{\"href\":\"#3-local-model-prioritization-for-development\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Local Model Prioritization for Development\"}]\n40a:[\"$\",\"a\",\"appservicesdevmodeservicepy\",{\"href\":\"#appservicesdevmodeservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/dev_mode_service.py\"}]\n40b:[\"$\",\"a\",\"4-request-batching-and-rate-limiting\",{\"href\":\"#4-request-batching-and-rate-limiting\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Request Batching and Rate Limiting\"}]\n40c:[\"$\",\"a\",\"appservicesratelimiterpy\",{\"href\":\"#appservicesratelimiterpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/rate_limiter.py\"}]\n40d:[\"$\",\"a\",\"5-memory-and-context-compression\",{\"href\":\"#5-memory-and-context-compression\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"5. Memory and Context Compression\"}]\n40e:[\"$\",\"a\",\"appservicescontextcompressionpy\",{\"href\":\"#appservicescontextcompressionpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/context_compression.py\"}]\n40f:[\"$\",\"a\",\"response-accuracy-optimization-strategies\",{\"href\":\"#response-accuracy-optimization-strategies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Response Accuracy Optimization Strategies\"}]\n410:[\"$\",\"a\",\"1-prompt-engineering-templates\",{\"href\":\"#1-prompt-engineering-templates\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Prompt Engineering Templates\"}]\n411:[\"$\",\"a\",\"appservicesprompttemplatespy\",{\"href\":\"#appservicesprompttemplatespy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/prompt_templates.py\"}]\n412:[\"$\",\"a\",\"2-context-aware-chain-of-thought\",{\"href\":\"#2-context-aware-chain-of-thought\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Context-Aware Chain of Thought\"}]\n413:[\"$\",\"a\",\"appserviceschainofthoughtpy\",{\"href\":\"#appserviceschainofthoughtpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/chain_of_thought.py\"}]\n414:[\"$\",\"a\",\"reasoning-process\",{\"href\":\"#reasoning-process\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Reasoning Process\"}]\n415:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm "])</script><script>self.__next_f.push([1,"text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}]\n416:[\"$\",\"a\",\"3-self-verification-and-error-correction\",{\"href\":\"#3-self-verification-and-error-correction\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Self-Verification and Error Correction\"}]\n417:[\"$\",\"a\",\"appservicesverificationservicepy\",{\"href\":\"#appservicesverificationservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/verification_service.py\"}]\n418:[\"$\",\"a\",\"4-domain-specific-knowledge-integration\",{\"href\":\"#4-domain-specific-knowledge-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Domain-Specific Knowledge Integration\"}]\n419:[\"$\",\"a\",\"appservicesdomainknowledgepy\",{\"href\":\"#appservicesdomainknowledgepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/domain_knowledge.py\"}]\n41a:[\"$\",\"a\",\"5-dynamic-few-shot-learning\",{\"href\":\"#5-dynamic-few-shot-learning\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"5. Dynamic Few-Shot Learning\"}]\n41b:[\"$\",\"a\",\"appservicesfewshotexamplespy\",{\"href\":\"#appservicesfewshotexamplespy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/few_shot_examples.py\"}]\n41c:[\"$\",\"a\",\"deployment-strategies\",{\"href\":\"#deployment-strategies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Deployment Strategies\"}]\n41d:[\"$\",\"a\",\"local-development-environment\",{\"href\":\"#local-development-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Local Development Environment\"}]\n41e:[\"$\",\"a\",\"localsetupsh-set-up-local-development-environment\",{\"href\":\"#localsetupsh-set-up-local-development-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"local_setup.sh - Set up local development environment\"}]\n41f:[\"$\",\"a\",\"check-for-required-tools\",{\"href\":\"#check-for-required-tools\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check for required tools\"}]\n420:[\"$\",\"a\",\"create-virtual-environment\",{\"href\":\"#create-virtual-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Create virtual environment\"}]\n421:[\"$\",\"a\",\"install-dependencies\",{\"href\":\"#install-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install dependencies\"}]\n422:[\"$\",\"a\",\"set-up-environment-file\",{\"href\":\"#set-up-environment-file\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set up environment file\"}]\n423:[\"$\",\"a\",\"check-if-ollama-is-installed\",{\"href\":\"#check-if-ollama-is-installed\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check if Ollama is installed\"}]\n424:[\"$\",\"a\",\"pull-required-ollama-models\",{\"href\":\"#pull-required-ollama-models\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Pull required Ollama models\"}]\n425:[\"$\",\"a\",\"start-redis-for-development\",{\"href\":\"#start-redis-for-development\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Start Redis for development\"}]\n426:[\"$\",\"a\",\"initialize-database\",{\"href\":\"#initialize-database\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize database\"}]\n427:[\"$\",\"a\",\"run-tests-to-verify-setup\",{\"href\":\"#run-tests-to-verify-setup\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run tests to verify setup"])</script><script>self.__next_f.push([1,"\"}]\n428:[\"$\",\"a\",\"docker-composeyml\",{\"href\":\"#docker-composeyml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"docker-compose.yml\"}]\n429:[\"$\",\"a\",\"dockerfiledev\",{\"href\":\"#dockerfiledev\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Dockerfile.dev\"}]\n42a:[\"$\",\"a\",\"install-system-dependencies\",{\"href\":\"#install-system-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install system dependencies\"}]\n42b:[\"$\",\"a\",\"install-python-dependencies\",{\"href\":\"#install-python-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install Python dependencies\"}]\n42c:[\"$\",\"a\",\"copy-application-code\",{\"href\":\"#copy-application-code\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy application code\"}]\n42d:[\"$\",\"a\",\"set-development-environment\",{\"href\":\"#set-development-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set development environment\"}]\n42e:[\"$\",\"a\",\"make-scripts-executable\",{\"href\":\"#make-scripts-executable\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Make scripts executable\"}]\n42f:[\"$\",\"a\",\"default-command\",{\"href\":\"#default-command\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Default command\"}]\n430:[\"$\",\"a\",\"appconfiglocalpy\",{\"href\":\"#appconfiglocalpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/config/local.py\"}]\n431:[\"$\",\"a\",\"api-configuration\",{\"href\":\"#api-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"API configuration\"}]\n432:[\"$\",\"a\",\"openai-configuration\",{\"href\":\"#openai-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"OpenAI configuration\"}]\n433:[\"$\",\"a\",\"ollama-configuration\",{\"href\":\"#ollama-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Ollama configuration\"}]\n434:[\"$\",\"a\",\"app-configuration\",{\"href\":\"#app-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"App configuration\"}]\n435:[\"$\",\"a\",\"feature-flags\",{\"href\":\"#feature-flags\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Feature flags\"}]\n436:[\"$\",\"a\",\"development-specific-settings\",{\"href\":\"#development-specific-settings\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Development-specific settings\"}]\n437:[\"$\",\"a\",\"redis-configuration\",{\"href\":\"#redis-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Redis configuration\"}]\n438:[\"$\",\"a\",\"production-deployment\",{\"href\":\"#production-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Production Deployment\"}]\n439:[\"$\",\"a\",\"kubernetesdeploymentyaml\",{\"href\":\"#kubernetesdeploymentyaml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"kubernetes/deployment.yaml\"}]\n43a:[\"$\",\"a\",\"kuberneteshpayaml\",{\"href\":\"#kuberneteshpayaml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"kubernetes/hpa.yaml\"}]\n43b:[\"$\",\"a\",\"deploysh-production-deployment-script\",{\"href\":\"#deploysh-production-deployment-script\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"deploy.sh - Production deployment script\"}]\n43c:[\"$\",\"a\",\"check-required-environment-variables\",{\"href\":\"#check-required-environment-variables\",\"className\":\"blo"])</script><script>self.__next_f.push([1,"ck text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check required environment variables\"}]\n43d:[\"$\",\"a\",\"build-and-push-docker-image\",{\"href\":\"#build-and-push-docker-image\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Build and push Docker image\"}]\n43e:[\"$\",\"a\",\"apply-kubernetes-configuration\",{\"href\":\"#apply-kubernetes-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Apply Kubernetes configuration\"}]\n43f:[\"$\",\"a\",\"create-namespace-if-it-doesnt-exist\",{\"href\":\"#create-namespace-if-it-doesnt-exist\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Create namespace if it doesn't exist\"}]\n440:[\"$\",\"a\",\"apply-secrets\",{\"href\":\"#apply-secrets\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Apply secrets\"}]\n441:[\"$\",\"a\",\"deploy-redis-if-needed\",{\"href\":\"#deploy-redis-if-needed\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Deploy Redis if needed\"}]\n442:[\"$\",\"a\",\"deploy-application\",{\"href\":\"#deploy-application\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Deploy application\"}]\n443:[\"$\",\"a\",\"replace-variables-in-deployment-file\",{\"href\":\"#replace-variables-in-deployment-file\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Replace variables in deployment file\"}]\n444:[\"$\",\"a\",\"apply-hpa\",{\"href\":\"#apply-hpa\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Apply HPA\"}]\n445:[\"$\",\"a\",\"verify-deployment\",{\"href\":\"#verify-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Verify deployment\"}]\n446:[\"$\",\"a\",\"initialize-ollama-models-if-needed\",{\"href\":\"#initialize-ollama-models-if-needed\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize Ollama models if needed\"}]\n447:[\"$\",\"a\",\"dockerfileprod\",{\"href\":\"#dockerfileprod\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Dockerfile.prod\"}]\n448:[\"$\",\"a\",\"install-build-dependencies\",{\"href\":\"#install-build-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install build dependencies\"}]\n449:[\"$\",\"a\",\"install-python-dependencies\",{\"href\":\"#install-python-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install Python dependencies\"}]\n44a:[\"$\",\"a\",\"final-stage\",{\"href\":\"#final-stage\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Final stage\"}]\n44b:[\"$\",\"a\",\"copy-wheels-from-builder-stage\",{\"href\":\"#copy-wheels-from-builder-stage\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy wheels from builder stage\"}]\n44c:[\"$\",\"a\",\"copy-application-code\",{\"href\":\"#copy-application-code\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy application code\"}]\n44d:[\"$\",\"a\",\"create-non-root-user\",{\"href\":\"#create-non-root-user\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Create non-root user\"}]\n44e:[\"$\",\"a\",\"set-production-environment\",{\"href\":\"#set-production-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set production environment\"}]\n44f:[\"$\",\"a\",\"expose-port\",{\"href\":\"#expose-port\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Expose port\"}]\n450:[\"$\",\"a\",\"run-using-gunicorn-in-production\",{\"href\":\"#run-using-gunicorn-in-production\",\"className\":\"block text-sm te"])</script><script>self.__next_f.push([1,"xt-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run using Gunicorn in production\"}]\n451:[\"$\",\"a\",\"appconfiggunicornpy\",{\"href\":\"#appconfiggunicornpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/config/gunicorn.py\"}]\n452:[\"$\",\"a\",\"bind-to-00008000\",{\"href\":\"#bind-to-00008000\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Bind to 0.0.0.0:8000\"}]\n453:[\"$\",\"a\",\"worker-configuration\",{\"href\":\"#worker-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Worker configuration\"}]\n454:[\"$\",\"a\",\"logging\",{\"href\":\"#logging\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Logging\"}]\n455:[\"$\",\"a\",\"security\",{\"href\":\"#security\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Security\"}]\n456:[\"$\",\"a\",\"process-naming\",{\"href\":\"#process-naming\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Process naming\"}]\n457:[\"$\",\"a\",\"cloud-deployment-aws\",{\"href\":\"#cloud-deployment-aws\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Cloud Deployment (AWS)\"}]\n458:[\"$\",\"a\",\"awscloudformationyaml\",{\"href\":\"#awscloudformationyaml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"aws/cloudformation.yaml\"}]\n459:[\"$\",\"a\",\"awsdeploysh-aws-deployment-script\",{\"href\":\"#awsdeploysh-aws-deployment-script\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"aws_deploy.sh - AWS deployment script\"}]\n45a:[\"$\",\"a\",\"check-required-aws-cli\",{\"href\":\"#check-required-aws-cli\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check required AWS CLI\"}]\n45b:[\"$\",\"a\",\"aws-configuration\",{\"href\":\"#aws-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"AWS configuration\"}]\n45c:[\"$\",\"a\",\"check-if-stack-exists\",{\"href\":\"#check-if-stack-exists\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check if stack exists\"}]\n45d:[\"$\",\"a\",\"deploy-cloudformation-stack\",{\"href\":\"#deploy-cloudformation-stack\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Deploy CloudFormation stack\"}]\n45e:[\"$\",\"a\",\"get-stack-outputs\",{\"href\":\"#get-stack-outputs\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Get stack outputs\"}]\n45f:[\"$\",\"a\",\"build-and-push-docker-image\",{\"href\":\"#build-and-push-docker-image\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Build and push Docker image\"}]\n460:[\"$\",\"a\",\"login-to-ecr\",{\"href\":\"#login-to-ecr\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Login to ECR\"}]\n461:[\"$\",\"a\",\"build-and-push\",{\"href\":\"#build-and-push\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Build and push\"}]\n462:[\"$\",\"a\",\"update-ecs-service-to-force-deployment\",{\"href\":\"#update-ecs-service-to-force-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Update ECS service to force deployment\"}]\n463:[\"$\",\"a\",\"optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system-continued\",{\"href\":\"#optimization-and-deployment-strategies-for-openai-ollama-hybrid-ai-system-continued\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Optimization and Deployment Strategies for OpenAI-Ollama Hybrid AI System (Continued)\"}]\n464:[\"$\",\"a\",\"monitoring-and-observability-configuration\",{\"href\":\"#monitoring-and-observabi"])</script><script>self.__next_f.push([1,"lity-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Monitoring and Observability Configuration\"}]\n465:[\"$\",\"a\",\"prometheus-and-grafana-setup-for-metrics\",{\"href\":\"#prometheus-and-grafana-setup-for-metrics\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Prometheus and Grafana Setup for Metrics\"}]\n466:[\"$\",\"a\",\"monitoringprometheus-configyaml\",{\"href\":\"#monitoringprometheus-configyaml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"monitoring/prometheus-config.yaml\"}]\n467:[\"$\",\"a\",\"grafana-dashboard-configuration\",{\"href\":\"#grafana-dashboard-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Grafana Dashboard Configuration\"}]\n468:[\"$\",\"a\",\"implementing-metrics-collection-in-api\",{\"href\":\"#implementing-metrics-collection-in-api\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Implementing Metrics Collection in API\"}]\n469:[\"$\",\"a\",\"appmiddlewaremetricspy\",{\"href\":\"#appmiddlewaremetricspy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/middleware/metrics.py\"}]\n46a:[\"$\",\"a\",\"initialize-metrics\",{\"href\":\"#initialize-metrics\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize metrics\"}]\n46b:[\"$\",\"a\",\"scaling-strategies\",{\"href\":\"#scaling-strategies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Scaling Strategies\"}]\n46c:[\"$\",\"a\",\"optimizing-ollama-scaling-for-high-loads\",{\"href\":\"#optimizing-ollama-scaling-for-high-loads\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Optimizing Ollama Scaling for High Loads\"}]\n46d:[\"$\",\"a\",\"appservicesollamascalingpy\",{\"href\":\"#appservicesollamascalingpy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/ollama_scaling.py\"}]\n46e:[\"$\",\"a\",\"autoscaling-configuration-for-cloud-deployments\",{\"href\":\"#autoscaling-configuration-for-cloud-deployments\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Autoscaling Configuration for Cloud Deployments\"}]\n46f:[\"$\",\"a\",\"kubernetesautoscaler-configyaml\",{\"href\":\"#kubernetesautoscaler-configyaml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"kubernetes/autoscaler-config.yaml\"}]\n470:[\"$\",\"a\",\"cost-optimization-monthly-budget-tracking\",{\"href\":\"#cost-optimization-monthly-budget-tracking\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Cost Optimization - Monthly Budget Tracking\"}]\n471:[\"$\",\"a\",\"appservicesbudgetservicepy\",{\"href\":\"#appservicesbudgetservicepy\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"app/services/budget_service.py\"}]\n472:[\"$\",\"a\",\"conclusion\",{\"href\":\"#conclusion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Conclusion\"}]\n473:[\"$\",\"a\",\"mcp-modern-computational-paradigm-system\",{\"href\":\"#mcp-modern-computational-paradigm-system\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"MCP (Modern Computational Paradigm) System\"}]\n474:[\"$\",\"a\",\"comprehensive-documentation\",{\"href\":\"#comprehensive-documentation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Comprehensive Documentation\"}]\n475:[\"$\",\"a\",\"table-of-contents\",{\"href\":\"#table-of-contents\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Table of Contents\"}]\n476:[\"$\",\"a\",\"readmemd\",{\"href\":\"#readmemd\",\"className\":\"block t"])</script><script>self.__next_f.push([1,"ext-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"README.md\"}]\n477:[\"$\",\"a\",\"mcp-modern-computational-paradigm\",{\"href\":\"#mcp-modern-computational-paradigm\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"MCP - Modern Computational Paradigm\"}]\n478:[\"$\",\"a\",\"key-features\",{\"href\":\"#key-features\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Key Features\"}]\n479:[\"$\",\"a\",\"quick-start\",{\"href\":\"#quick-start\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Quick Start\"}]\n47a:[\"$\",\"a\",\"prerequisites\",{\"href\":\"#prerequisites\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Prerequisites\"}]\n47b:[\"$\",\"a\",\"installation\",{\"href\":\"#installation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Installation\"}]\n47c:[\"$\",\"a\",\"docker-deployment\",{\"href\":\"#docker-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Docker Deployment\"}]\n47d:[\"$\",\"a\",\"documentation\",{\"href\":\"#documentation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Documentation\"}]\n47e:[\"$\",\"a\",\"architecture\",{\"href\":\"#architecture\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Architecture\"}]\n47f:[\"$\",\"a\",\"license\",{\"href\":\"#license\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"License\"}]\n480:[\"$\",\"a\",\"contributing\",{\"href\":\"#contributing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Contributing\"}]\n481:[\"$\",\"a\",\"installation-guide\",{\"href\":\"#installation-guide\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Installation Guide\"}]\n482:[\"$\",\"a\",\"prerequisites\",{\"href\":\"#prerequisites\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Prerequisites\"}]\n483:[\"$\",\"a\",\"system-requirements\",{\"href\":\"#system-requirements\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"System Requirements\"}]\n484:[\"$\",\"a\",\"software-requirements\",{\"href\":\"#software-requirements\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Software Requirements\"}]\n485:[\"$\",\"a\",\"required-api-keys\",{\"href\":\"#required-api-keys\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Required API Keys\"}]\n486:[\"$\",\"a\",\"local-development-setup\",{\"href\":\"#local-development-setup\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Local Development Setup\"}]\n487:[\"$\",\"a\",\"1-clone-the-repository\",{\"href\":\"#1-clone-the-repository\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Clone the Repository\"}]\n488:[\"$\",\"a\",\"2-set-up-virtual-environment\",{\"href\":\"#2-set-up-virtual-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Set Up Virtual Environment\"}]\n489:[\"$\",\"a\",\"create-virtual-environment\",{\"href\":\"#create-virtual-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Create virtual environment\"}]\n48a:[\"$\",\"a\",\"activate-virtual-environment\",{\"href\":\"#activate-virtual-environment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Activate virtual environment\"}]\n48b:[\"$\",\"a\",\"on-linuxmacos\",{\"href\":\"#on-linuxmacos\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"On L"])</script><script>self.__next_f.push([1,"inux/macOS:\"}]\n48c:[\"$\",\"a\",\"on-windows\",{\"href\":\"#on-windows\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"On Windows:\"}]\n48d:[\"$\",\"a\",\"3-install-dependencies\",{\"href\":\"#3-install-dependencies\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Install Dependencies\"}]\n48e:[\"$\",\"a\",\"4-install-and-configure-ollama\",{\"href\":\"#4-install-and-configure-ollama\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Install and Configure Ollama\"}]\n48f:[\"$\",\"a\",\"macos-using-homebrew\",{\"href\":\"#macos-using-homebrew\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"macOS (using Homebrew)\"}]\n490:[\"$\",\"a\",\"linux\",{\"href\":\"#linux\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Linux\"}]\n491:[\"$\",\"a\",\"start-ollama-service\",{\"href\":\"#start-ollama-service\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Start Ollama service\"}]\n492:[\"$\",\"a\",\"5-pull-required-models\",{\"href\":\"#5-pull-required-models\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"5. Pull Required Models\"}]\n493:[\"$\",\"a\",\"pull-basic-models\",{\"href\":\"#pull-basic-models\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Pull basic models\"}]\n494:[\"$\",\"a\",\"6-set-up-environment-variables\",{\"href\":\"#6-set-up-environment-variables\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"6. Set Up Environment Variables\"}]\n495:[\"$\",\"a\",\"copy-the-example-environment-file\",{\"href\":\"#copy-the-example-environment-file\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy the example environment file\"}]\n496:[\"$\",\"a\",\"edit-the-file-with-your-configuration\",{\"href\":\"#edit-the-file-with-your-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Edit the file with your configuration\"}]\n497:[\"$\",\"a\",\"at-minimum-set-openaiapikey\",{\"href\":\"#at-minimum-set-openaiapikey\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"At minimum, set OPENAI_API_KEY\"}]\n498:[\"$\",\"a\",\"7-initialize-local-services\",{\"href\":\"#7-initialize-local-services\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"7. Initialize Local Services\"}]\n499:[\"$\",\"a\",\"start-redis-using-docker\",{\"href\":\"#start-redis-using-docker\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Start Redis using Docker\"}]\n49a:[\"$\",\"a\",\"initialize-database-if-applicable\",{\"href\":\"#initialize-database-if-applicable\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Initialize database (if applicable)\"}]\n49b:[\"$\",\"a\",\"8-start-development-server\",{\"href\":\"#8-start-development-server\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"8. Start Development Server\"}]\n49c:[\"$\",\"a\",\"start-with-auto-reload-for-development\",{\"href\":\"#start-with-auto-reload-for-development\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Start with auto-reload for development\"}]\n49d:[\"$\",\"a\",\"9-verify-installation\",{\"href\":\"#9-verify-installation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"9. Verify Installation\"}]\n49e:[\"$\",\"a\",\"docker-deployment\",{\"href\":\"#docker-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Docker Deployment\"}]\n49f:[\"$\",\"a\",\"1-ensure-docker-and-docker-compose-are-installed\",{\"href\":\"#1-ensure-docker-an"])</script><script>self.__next_f.push([1,"d-docker-compose-are-installed\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Ensure Docker and Docker Compose are Installed\"}]\n4a0:[\"$\",\"a\",\"verify-installation\",{\"href\":\"#verify-installation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Verify installation\"}]\n4a1:[\"$\",\"a\",\"2-configure-environment-variables\",{\"href\":\"#2-configure-environment-variables\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Configure Environment Variables\"}]\n4a2:[\"$\",\"a\",\"copy-and-edit-environment-variables\",{\"href\":\"#copy-and-edit-environment-variables\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Copy and edit environment variables\"}]\n4a3:[\"$\",\"a\",\"3-start-services-with-docker-compose\",{\"href\":\"#3-start-services-with-docker-compose\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Start Services with Docker Compose\"}]\n4a4:[\"$\",\"a\",\"build-and-start-all-services\",{\"href\":\"#build-and-start-all-services\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Build and start all services\"}]\n4a5:[\"$\",\"a\",\"view-logs\",{\"href\":\"#view-logs\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"View logs\"}]\n4a6:[\"$\",\"a\",\"4-stopping-the-services\",{\"href\":\"#4-stopping-the-services\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Stopping the Services\"}]\n4a7:[\"$\",\"a\",\"kubernetes-deployment\",{\"href\":\"#kubernetes-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Kubernetes Deployment\"}]\n4a8:[\"$\",\"a\",\"1-prerequisites\",{\"href\":\"#1-prerequisites\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Prerequisites\"}]\n4a9:[\"$\",\"a\",\"2-set-up-namespace-and-secrets\",{\"href\":\"#2-set-up-namespace-and-secrets\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. Set Up Namespace and Secrets\"}]\n4aa:[\"$\",\"a\",\"create-namespace\",{\"href\":\"#create-namespace\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Create namespace\"}]\n4ab:[\"$\",\"a\",\"create-secrets\",{\"href\":\"#create-secrets\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Create secrets\"}]\n4ac:[\"$\",\"a\",\"3-deploy-redis-if-needed\",{\"href\":\"#3-deploy-redis-if-needed\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Deploy Redis (if needed)\"}]\n4ad:[\"$\",\"a\",\"using-helm\",{\"href\":\"#using-helm\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Using Helm\"}]\n4ae:[\"$\",\"a\",\"4-deploy-mcp-components\",{\"href\":\"#4-deploy-mcp-components\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Deploy MCP Components\"}]\n4af:[\"$\",\"a\",\"apply-kubernetes-manifests\",{\"href\":\"#apply-kubernetes-manifests\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Apply Kubernetes manifests\"}]\n4b0:[\"$\",\"a\",\"5-set-up-autoscaling-optional\",{\"href\":\"#5-set-up-autoscaling-optional\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"5. Set Up Autoscaling (Optional)\"}]\n4b1:[\"$\",\"a\",\"6-check-deployment-status\",{\"href\":\"#6-check-deployment-status\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"6. Check Deployment Status\"}]\n4b2:[\"$\",\"a\",\"aws-deployment\",{\"href\":\"#aws-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"AWS Deploy"])</script><script>self.__next_f.push([1,"ment\"}]\n4b3:[\"$\",\"a\",\"1-prerequisites\",{\"href\":\"#1-prerequisites\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"1. Prerequisites\"}]\n4b4:[\"$\",\"a\",\"2-cloudformation-deployment\",{\"href\":\"#2-cloudformation-deployment\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"2. CloudFormation Deployment\"}]\n4b5:[\"$\",\"a\",\"deploy-using-cloudformation-template\",{\"href\":\"#deploy-using-cloudformation-template\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Deploy using CloudFormation template\"}]\n4b6:[\"$\",\"a\",\"check-deployment-status\",{\"href\":\"#check-deployment-status\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check deployment status\"}]\n4b7:[\"$\",\"a\",\"3-deploy-api-image-to-ecr\",{\"href\":\"#3-deploy-api-image-to-ecr\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"3. Deploy API Image to ECR\"}]\n4b8:[\"$\",\"a\",\"log-in-to-ecr\",{\"href\":\"#log-in-to-ecr\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Log in to ECR\"}]\n4b9:[\"$\",\"a\",\"build-and-push-image\",{\"href\":\"#build-and-push-image\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Build and push image\"}]\n4ba:[\"$\",\"a\",\"4-update-ecs-service\",{\"href\":\"#4-update-ecs-service\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"4. Update ECS Service\"}]\n4bb:[\"$\",\"a\",\"force-new-deployment-to-use-the-updated-image\",{\"href\":\"#force-new-deployment-to-use-the-updated-image\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Force new deployment to use the updated image\"}]\n4bc:[\"$\",\"a\",\"api-reference\",{\"href\":\"#api-reference\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"API Reference\"}]\n4bd:[\"$\",\"a\",\"authentication\",{\"href\":\"#authentication\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Authentication\"}]\n4be:[\"$\",\"a\",\"bearer-token-authentication\",{\"href\":\"#bearer-token-authentication\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Bearer Token Authentication\"}]\n4bf:[\"$\",\"a\",\"query-parameter\",{\"href\":\"#query-parameter\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Query Parameter\"}]\n4c0:[\"$\",\"a\",\"chat-endpoints\",{\"href\":\"#chat-endpoints\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Chat Endpoints\"}]\n4c1:[\"$\",\"a\",\"create-chat-completion\",{\"href\":\"#create-chat-completion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Create Chat Completion\"}]\n4c2:[\"$\",\"a\",\"stream-chat-completion\",{\"href\":\"#stream-chat-completion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Stream Chat Completion\"}]\n4c3:[\"$\",\"a\",\"hybrid-chat\",{\"href\":\"#hybrid-chat\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Hybrid Chat\"}]\n4c4:[\"$\",\"a\",\"agent-endpoints\",{\"href\":\"#agent-endpoints\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Agent Endpoints\"}]\n4c5:[\"$\",\"a\",\"run-agent\",{\"href\":\"#run-agent\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Run Agent\"}]\n4c6:[\"$\",\"a\",\"get-agent-status\",{\"href\":\"#get-agent-status\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Get Agent Status\"}]\n4c7:[\"$\",\"a\",\"list-available-agents\",{\"href\":\"#list-available-agents\",\"className\":\"block text-sm text-muted-foreground hov"])</script><script>self.__next_f.push([1,"er:text-foreground transition-colors pl-4\",\"children\":\"List Available Agents\"}]\n4c8:[\"$\",\"a\",\"model-management-endpoints\",{\"href\":\"#model-management-endpoints\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Model Management Endpoints\"}]\n4c9:[\"$\",\"a\",\"list-models\",{\"href\":\"#list-models\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"List Models\"}]\n4ca:[\"$\",\"a\",\"get-model-details\",{\"href\":\"#get-model-details\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Get Model Details\"}]\n4cb:[\"$\",\"a\",\"pull-ollama-model\",{\"href\":\"#pull-ollama-model\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Pull Ollama Model\"}]\n4cc:[\"$\",\"a\",\"system-endpoints\",{\"href\":\"#system-endpoints\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"System Endpoints\"}]\n4cd:[\"$\",\"a\",\"health-check\",{\"href\":\"#health-check\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Health Check\"}]\n4ce:[\"$\",\"a\",\"system-configuration\",{\"href\":\"#system-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"System Configuration\"}]\n4cf:[\"$\",\"a\",\"update-configuration\",{\"href\":\"#update-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Update Configuration\"}]\n4d0:[\"$\",\"a\",\"system-metrics\",{\"href\":\"#system-metrics\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"System Metrics\"}]\n4d1:[\"$\",\"a\",\"configuration\",{\"href\":\"#configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Configuration\"}]\n4d2:[\"$\",\"a\",\"environment-variables\",{\"href\":\"#environment-variables\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Environment Variables\"}]\n4d3:[\"$\",\"a\",\"core-configuration\",{\"href\":\"#core-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Core Configuration\"}]\n4d4:[\"$\",\"a\",\"redis-configuration\",{\"href\":\"#redis-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Redis Configuration\"}]\n4d5:[\"$\",\"a\",\"routing-configuration\",{\"href\":\"#routing-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Routing Configuration\"}]\n4d6:[\"$\",\"a\",\"performance-configuration\",{\"href\":\"#performance-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Performance Configuration\"}]\n4d7:[\"$\",\"a\",\"cost-optimization\",{\"href\":\"#cost-optimization\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Cost Optimization\"}]\n4d8:[\"$\",\"a\",\"monitoring\",{\"href\":\"#monitoring\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Monitoring\"}]\n4d9:[\"$\",\"a\",\"advanced-configuration\",{\"href\":\"#advanced-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Advanced Configuration\"}]\n4da:[\"$\",\"a\",\"configuration-file\",{\"href\":\"#configuration-file\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Configuration File\"}]\n4db:[\"$\",\"a\",\"custom-provider-configuration\",{\"href\":\"#custom-provider-configuration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Custom Provider Configuration\"}]\n4dc:[\"$\",\"a\",\"model-selection\",{\"href\":\"#model-selection\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-co"])</script><script>self.__next_f.push([1,"lors pl-0\",\"children\":\"Model Selection\"}]\n4dd:[\"$\",\"a\",\"model-tiers\",{\"href\":\"#model-tiers\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Model Tiers\"}]\n4de:[\"$\",\"a\",\"task-specific-model-mapping\",{\"href\":\"#task-specific-model-mapping\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Task-Specific Model Mapping\"}]\n4df:[\"$\",\"a\",\"usage-examples\",{\"href\":\"#usage-examples\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Usage Examples\"}]\n4e0:[\"$\",\"a\",\"basic-chat-interaction\",{\"href\":\"#basic-chat-interaction\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Basic Chat Interaction\"}]\n4e1:[\"$\",\"a\",\"python-example\",{\"href\":\"#python-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Python Example\"}]\n4e2:[\"$\",\"a\",\"basic-chat-completion\",{\"href\":\"#basic-chat-completion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Basic chat completion\"}]\n4e3:[\"$\",\"a\",\"example-conversation\",{\"href\":\"#example-conversation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Example conversation\"}]\n4e4:[\"$\",\"a\",\"curl-example\",{\"href\":\"#curl-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"cURL Example\"}]\n4e5:[\"$\",\"a\",\"simple-completion\",{\"href\":\"#simple-completion\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Simple completion\"}]\n4e6:[\"$\",\"a\",\"streaming-response\",{\"href\":\"#streaming-response\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Streaming response\"}]\n4e7:[\"$\",\"a\",\"working-with-agents\",{\"href\":\"#working-with-agents\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Working with Agents\"}]\n4e8:[\"$\",\"a\",\"python-example\",{\"href\":\"#python-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Python Example\"}]\n4e9:[\"$\",\"a\",\"run-an-agent-with-tools\",{\"href\":\"#run-an-agent-with-tools\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run an agent with tools\"}]\n4ea:[\"$\",\"a\",\"example-usage\",{\"href\":\"#example-usage\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Example usage\"}]\n4eb:[\"$\",\"a\",\"curl-example\",{\"href\":\"#curl-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"cURL Example\"}]\n4ec:[\"$\",\"a\",\"run-an-agent\",{\"href\":\"#run-an-agent\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run an agent\"}]\n4ed:[\"$\",\"a\",\"check-status\",{\"href\":\"#check-status\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check status\"}]\n4ee:[\"$\",\"a\",\"customizing-model-selection\",{\"href\":\"#customizing-model-selection\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Customizing Model Selection\"}]\n4ef:[\"$\",\"a\",\"python-example\",{\"href\":\"#python-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Python Example\"}]\n4f0:[\"$\",\"a\",\"custom-routing-preferences\",{\"href\":\"#custom-routing-preferences\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Custom routing preferences\"}]\n4f1:[\"$\",\"a\",\"examples-with-different-routing-preferences\",{\"href\":\"#examples-with-different-routing-preferences\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Examples with different routing preferences\"}]\n4f2:[\""])</script><script>self.__next_f.push([1,"$\",\"a\",\"curl-example\",{\"href\":\"#curl-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"cURL Example\"}]\n4f3:[\"$\",\"a\",\"force-ollama-for-this-request\",{\"href\":\"#force-ollama-for-this-request\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Force Ollama for this request\"}]\n4f4:[\"$\",\"a\",\"force-specific-model\",{\"href\":\"#force-specific-model\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Force specific model\"}]\n4f5:[\"$\",\"a\",\"tool-integration\",{\"href\":\"#tool-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Tool Integration\"}]\n4f6:[\"$\",\"a\",\"python-example\",{\"href\":\"#python-example\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Python Example\"}]\n4f7:[\"$\",\"a\",\"chat-with-tool-integration\",{\"href\":\"#chat-with-tool-integration\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Chat with tool integration\"}]\n4f8:[\"$\",\"a\",\"define-available-tools\",{\"href\":\"#define-available-tools\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Define available tools\"}]\n4f9:[\"$\",\"a\",\"example-usage\",{\"href\":\"#example-usage\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Example usage\"}]\n4fa:[\"$\",\"a\",\"troubleshooting\",{\"href\":\"#troubleshooting\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Troubleshooting\"}]\n4fb:[\"$\",\"a\",\"common-issues\",{\"href\":\"#common-issues\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Common Issues\"}]\n4fc:[\"$\",\"a\",\"installation-issues\",{\"href\":\"#installation-issues\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Installation Issues\"}]\n4fd:[\"$\",\"a\",\"api-connection-issues\",{\"href\":\"#api-connection-issues\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"API Connection Issues\"}]\n4fe:[\"$\",\"a\",\"performance-issues\",{\"href\":\"#performance-issues\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Performance Issues\"}]\n4ff:[\"$\",\"a\",\"routing-and-model-issues\",{\"href\":\"#routing-and-model-issues\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Routing and Model Issues\"}]\n500:[\"$\",\"a\",\"diagnostics\",{\"href\":\"#diagnostics\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Diagnostics\"}]\n501:[\"$\",\"a\",\"log-analysis\",{\"href\":\"#log-analysis\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Log Analysis\"}]\n502:[\"$\",\"a\",\"view-api-logs\",{\"href\":\"#view-api-logs\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"View API logs\"}]\n503:[\"$\",\"a\",\"view-ollama-logs\",{\"href\":\"#view-ollama-logs\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"View Ollama logs\"}]\n504:[\"$\",\"a\",\"search-for-errors\",{\"href\":\"#search-for-errors\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Search for errors\"}]\n505:[\"$\",\"a\",\"check-routing-decisions\",{\"href\":\"#check-routing-decisions\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check routing decisions\"}]\n506:[\"$\",\"a\",\"health-check\",{\"href\":\"#health-check\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Health Check\"}]\n507:[\"$\",\"a\",\"for-more-detailed-health-information\",{\"href\":\"#for-more-detailed-health-information\",\"className\":\"block text-s"])</script><script>self.__next_f.push([1,"m text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"For more detailed health information\"}]\n508:[\"$\",\"a\",\"debug-mode\",{\"href\":\"#debug-mode\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Debug Mode\"}]\n509:[\"$\",\"a\",\"set-environment-variable\",{\"href\":\"#set-environment-variable\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set environment variable\"}]\n50a:[\"$\",\"a\",\"or-modify-in-env-file\",{\"href\":\"#or-modify-in-env-file\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Or modify in .env file\"}]\n50b:[\"$\",\"a\",\"performance-testing\",{\"href\":\"#performance-testing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Performance Testing\"}]\n50c:[\"$\",\"a\",\"log-management\",{\"href\":\"#log-management\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Log Management\"}]\n50d:[\"$\",\"a\",\"log-levels\",{\"href\":\"#log-levels\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Log Levels\"}]\n50e:[\"$\",\"a\",\"log-formats\",{\"href\":\"#log-formats\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Log Formats\"}]\n50f:[\"$\",\"a\",\"set-json-logging\",{\"href\":\"#set-json-logging\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set JSON logging\"}]\n510:[\"$\",\"a\",\"set-text-logging-default\",{\"href\":\"#set-text-logging-default\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Set text logging (default)\"}]\n511:[\"$\",\"a\",\"external-log-management\",{\"href\":\"#external-log-management\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"External Log Management\"}]\n512:[\"$\",\"a\",\"using-fluentd\",{\"href\":\"#using-fluentd\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Using Fluentd\"}]\n513:[\"$\",\"a\",\"in-docker-composeyml\",{\"href\":\"#in-docker-composeyml\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"In docker-compose.yml\"}]\n514:[\"$\",\"a\",\"contributing\",{\"href\":\"#contributing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Contributing\"}]\n515:[\"$\",\"a\",\"getting-started\",{\"href\":\"#getting-started\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Getting Started\"}]\n516:[\"$\",\"a\",\"development-guidelines\",{\"href\":\"#development-guidelines\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Development Guidelines\"}]\n517:[\"$\",\"a\",\"code-style\",{\"href\":\"#code-style\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Code Style\"}]\n518:[\"$\",\"a\",\"install-development-tools\",{\"href\":\"#install-development-tools\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Install development tools\"}]\n519:[\"$\",\"a\",\"format-code\",{\"href\":\"#format-code\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Format code\"}]\n51a:[\"$\",\"a\",\"check-style\",{\"href\":\"#check-style\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Check style\"}]\n51b:[\"$\",\"a\",\"run-type-checking\",{\"href\":\"#run-type-checking\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run type checking\"}]\n51c:[\"$\",\"a\",\"testing\",{\"href\":\"#testing\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Testing\"}]\n51d:[\"$\",\"a\",\"run-tests\",{\"href\":\"#run-tests\",\"className\":\"block text-sm text-muted-foreground"])</script><script>self.__next_f.push([1," hover:text-foreground transition-colors \",\"children\":\"Run tests\"}]\n51e:[\"$\",\"a\",\"run-tests-with-coverage\",{\"href\":\"#run-tests-with-coverage\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run tests with coverage\"}]\n51f:[\"$\",\"a\",\"run-only-unit-tests\",{\"href\":\"#run-only-unit-tests\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run only unit tests\"}]\n520:[\"$\",\"a\",\"run-integration-tests\",{\"href\":\"#run-integration-tests\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"Run integration tests\"}]\n521:[\"$\",\"a\",\"documentation\",{\"href\":\"#documentation\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-4\",\"children\":\"Documentation\"}]\n522:[\"$\",\"a\",\"submitting-changes\",{\"href\":\"#submitting-changes\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Submitting Changes\"}]\n523:[\"$\",\"a\",\"code-of-conduct\",{\"href\":\"#code-of-conduct\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Code of Conduct\"}]\n524:[\"$\",\"a\",\"license\",{\"href\":\"#license\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"License\"}]\n525:[\"$\",\"a\",\"license\",{\"href\":\"#license\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors \",\"children\":\"License\"}]\n526:[\"$\",\"a\",\"mit-license\",{\"href\":\"#mit-license\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"MIT License\"}]\n527:[\"$\",\"a\",\"third-party-licenses\",{\"href\":\"#third-party-licenses\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Third-Party Licenses\"}]\n528:[\"$\",\"a\",\"usage-restrictions\",{\"href\":\"#usage-restrictions\",\"className\":\"block text-sm text-muted-foreground hover:text-foreground transition-colors pl-0\",\"children\":\"Usage Restrictions\"}]\n"])</script><script>self.__next_f.push([1,"f:[[\"$\",\"meta\",\"0\",{\"charSet\":\"utf-8\"}],[\"$\",\"meta\",\"1\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"2\",{\"name\":\"theme-color\",\"content\":\"#000000\"}]]\n"])</script><script>self.__next_f.push([1,"52a:I[27201,[\"/_next/static/chunks/ff1a16fafef87110.js\",\"/_next/static/chunks/bd395c328f5600f0.js\"],\"IconMark\"]\nd:null\n"])</script><script>self.__next_f.push([1,"11:[[\"$\",\"title\",\"0\",{\"children\":\"OpenAI Agents SDK \u0026 Ollama Integration: Complete Architecture Guide\"}],[\"$\",\"meta\",\"1\",{\"name\":\"description\",\"content\":\"This comprehensive guide demonstrates how to integrate the official OpenAI Agents SDK with Ollama to create AI agents that run entirely on local infrastructure. By the end, you'll understand both the theoretical foundations and practical implementation of locally-hosted AI agents.\"}],[\"$\",\"meta\",\"2\",{\"name\":\"author\",\"content\":\"Daniel Kliewer\"}],[\"$\",\"meta\",\"3\",{\"name\":\"keywords\",\"content\":\"AI,LLM,RAG,Software Engineer,Machine Learning,Next.js,React,Python,TypeScript,Local LLM,Ollama,Autonomous Agents,Data Sovereignty,Privacy-Focused AI,Knowledge Graphs,GraphRAG,Austin,Texas,Freelance\"}],[\"$\",\"meta\",\"4\",{\"name\":\"creator\",\"content\":\"Daniel Kliewer\"}],[\"$\",\"meta\",\"5\",{\"name\":\"publisher\",\"content\":\"Daniel Kliewer\"}],[\"$\",\"meta\",\"6\",{\"name\":\"robots\",\"content\":\"index, follow\"}],[\"$\",\"meta\",\"7\",{\"name\":\"googlebot\",\"content\":\"index, follow, max-video-preview:-1, max-image-preview:large, max-snippet:-1\"}],[\"$\",\"meta\",\"8\",{\"name\":\"category\",\"content\":\"technology\"}],[\"$\",\"link\",\"9\",{\"rel\":\"canonical\",\"href\":\"https://danielkliewer.com\"}],[\"$\",\"link\",\"10\",{\"rel\":\"alternate\",\"hrefLang\":\"en-US\",\"href\":\"https://danielkliewer.com\"}],[\"$\",\"meta\",\"11\",{\"name\":\"apple-itunes-app\",\"content\":\"app-id=123456789\"}],[\"$\",\"meta\",\"12\",{\"name\":\"format-detection\",\"content\":\"telephone=no, address=no, email=no\"}],[\"$\",\"meta\",\"13\",{\"name\":\"google-site-verification\",\"content\":\"google-site-verification-code\"}],[\"$\",\"meta\",\"14\",{\"name\":\"yandex-verification\",\"content\":\"yandex-verification-code\"}],[\"$\",\"meta\",\"15\",{\"property\":\"al:ios:url\",\"content\":\"https://apps.apple.com/app/id123456789\"}],[\"$\",\"meta\",\"16\",{\"property\":\"al:ios:app_store_id\",\"content\":\"123456789\"}],[\"$\",\"meta\",\"17\",{\"property\":\"al:android:package\",\"content\":\"com.danielkliewer.app\"}],[\"$\",\"meta\",\"18\",{\"property\":\"al:android:app_name\",\"content\":\"Daniel Kliewer\"}],[\"$\",\"link\",\"19\",{\"rel\":\"icon\",\"href\":\"/favicon.ico?favicon.0b3bf435.ico\",\"sizes\":\"256x256\",\"type\":\"image/x-icon\"}],[\"$\",\"$L52a\",\"20\",{}]]\n"])</script></body></html>