Simulacra01: Complete Guide to Building Local AI Agents with OpenAI Agents SDK and Ollama Integration

Comprehensive Guide to Simulacra01
This guide provides detailed documentation on how to use, customize, and extend Simulacra01, a framework that integrates the OpenAI Agents SDK with Ollama for local AI agent capabilities.
Table of Contents
- Introduction
- Understanding the Architecture
- Installation & Setup
- Using Document Analysis Agent
- Working with the Command-Line Interface
- Creating Custom Agents
- Advanced Customization
- Debugging and Troubleshooting
- Performance Optimization
- Contributing and Development
Introduction
Simulacra01 is a powerful framework that brings together the structured agent capabilities of OpenAI's Agents SDK with the privacy and cost benefits of local LLM inference through Ollama. This integration enables you to build sophisticated AI agents that run entirely on your local infrastructure.
Key Benefits
- Complete Data Privacy: All processing happens locally, with no data sent to external services
- Cost Efficiency: No per-token API costs associated with cloud-based LLM services
- Customizability: Full control over model selection, fine-tuning, and behavior
- Network Independence: Agents function without requiring internet access
- Reduced Latency: Eliminate network roundtrips for faster responses
Core Components
- OpenAI Agents SDK: Provides the structured framework for building AI agents
- Ollama: Enables local running of various open-source LLMs
- Adapter Layer: Connects the two technologies seamlessly
- Specialized Agents: Pre-built agents for document analysis and other tasks
- Command-Line Interface: Interactive way to engage with agents
Understanding the Architecture
Simulacra01 employs a layered architecture designed for flexibility and extensibility:
Ollama Layer
The base layer provides LLM inference capabilities:
- Handles model loading and management
- Processes raw prompts into completions
- Manages system resources for inference
- Provides API endpoints that mimic OpenAI's structure
Adapter Layer
The bridge between Ollama and the OpenAI Agents SDK:
OllamaClient: Routes requests to Ollama's API endpointsAgentAdapter: Makes OpenAI's Agent class compatible with the Ollama backendResponseFormatter: Ensures responses match expected formatsToolCallProcessor: Handles function/tool calls with local models
Agents SDK Layer
Provides the agent framework and abstractions:
- Agent lifecycle management
- Tool definition and integration
- Conversation handling
- Response processing
Application Layer
Implements specialized agents and interfaces:
- Document Analysis Agent
- Command-Line Interface
- Document Memory system
- Other specialized agent types
Installation & Setup
System Requirements
- Python 3.9 or higher
- 8GB+ RAM recommended (model dependent)
- 2GB+ free disk space for model storage
Step 1: Install Ollama
For macOS and Linux:
Bashcurl -fsSL https://ollama.ai/install.sh | sh
For Windows, download from Ollama's website.
Verify installation:
Bashollama --version
Step 2: Download Required Models
Bash# Pull the Mistral model (recommended starting model) ollama pull mistral # Optional: Pull additional models ollama pull llama3 ollama pull mixtral
Verify model installation:
Bashollama list
Step 3: Clone and Install Simulacra01
Bashgit clone https://github.com/kliewerdaniel/simulacra01.git cd simulacra01 pip install -e .
Step 4: Install Dependencies
Bashpip install -r requirements.txt
Step 5: Verify Installation
Run the basic test script:
Bashpython -c "from ollama_client import OllamaClient; client = OllamaClient(); response = client.chat.completions.create(model='mistral', messages=[{'role': 'user', 'content': 'Hello, world!'}]); print(response.choices[0].message.content)"
You should see a response from the model.
Using Document Analysis Agent
The Document Analysis Agent is a powerful tool for extracting information from documents, answering questions about content, and managing a document repository.
Basic Usage
Run the document agent:
Bashpython main.py
This will start an interactive session with the agent.
Available Commands
exit: Exit the agenthelp: Show help informationlist: List documents in memory
Example Interactions
Analyze a webpage:
You: Please analyze the article at https://en.wikipedia.org/wiki/Artificial_intelligence and tell me when AI was first developed.
Extract specific information:
You: Extract all the dates mentioned in the last document.
Search for content:
You: Find information about neural networks in the document.
Tool Functionality
The Document Analysis Agent includes several specialized tools:
fetch_document
Retrieves document content from a URL:
Pythonfetch_document(url="https://example.com/article")
This tool:
- Checks if the document is already in memory
- If not, fetches it from the URL
- Stores it in document memory for future use
- Returns the document content
extract_info
Extracts specific types of information from text:
Pythonextract_info(text="document content", info_type="dates")
Common info types:
dates: Extracts dates and timestampsnames: Extracts person namesorganizations: Extracts organization nameskey points: Extracts main ideas or argumentsstatistics: Extracts numerical data and statistics
search_document
Searches document content for relevant information:
Pythonsearch_document(text="document content", query="neural networks")
This uses semantic search to find the most relevant paragraphs for the query.
Document Memory
The Document Memory system provides persistent storage for documents:
Pythonfrom document_memory import DocumentMemory # Initialize memory memory = DocumentMemory() # Store a document doc_id = memory.store_document( url="https://example.com/article", content="Document text goes here...", metadata={"author": "John Doe", "date": "2025-03-13"} ) # Retrieve a document doc = memory.get_document(doc_id) print(doc["content"]) # List all documents docs = memory.list_documents() for doc in docs: print(f"URL: {doc['url']}")
Document memory is stored on disk and persists between sessions.
Working with the Command-Line Interface
The Simulacra01 CLI provides a comprehensive interface for interacting with various agent types.
Starting the CLI
Bash# Start with interactive menu python cli.py # Start directly with a specific agent python cli.py chat --agent document python cli.py chat --agent research
Global Commands
These commands work across all agent types:
exit: End the current sessionhelp: Show available commandsclear: Clear the conversation historysave [filename]: Save the current conversationload <filename>: Load a saved conversationlist: List saved conversationstools: List available tools
Agent-Specific Commands
Document Agent
list docs: List stored documentsanalyze <url>: Analyze a document at URL
Research Agent
search <topic>: Research a topicsynthesize: Summarize research findingssave research <filename>: Save research data
Task Agent
add task <title>: Add a new tasklist tasks: Show all tasksupdate task <id>: Update task status
Configuration
Configure the CLI using:
Bashpython cli.py config
This allows you to customize:
- OpenAI and Ollama settings
- Model preferences
- Agent-specific parameters
- System prompts
Configuration is stored in ~/.simulacra/config.json.
Creating Custom Agents
Simulacra01 makes it easy to create custom agents tailored to specific use cases.
Basic Agent Creation
Pythonfrom agents import Agent, function_tool from ollama_client import OllamaClient from pydantic import BaseModel, Field # Define the client client = OllamaClient(model_name="mistral") # Define tool schemas class AddInput(BaseModel): a: int = Field(..., description="First number") b: int = Field(..., description="Second number") class AddOutput(BaseModel): result: int = Field(..., description="Sum of the two numbers") # Define the tool function @function_tool def add(a: int, b: int) -> dict: """Adds two numbers together.""" return {"result": a + b} # Create the agent agent = Agent( name="MathAgent", instructions="You are a math assistant that helps users with calculations.", tools=[add], model=client, ) # Use the agent response = agent.run("What is 5 + 7?") print(response.message)
Tool Development Best Practices
- Clear Function Signatures: Make parameter names intuitive
- Comprehensive Docstrings: Explain what the tool does
- Error Handling: Gracefully handle exceptions
- Type Annotations: Use proper type hints
- Schema Definitions: Use Pydantic for input/output validation
Complex Agent Example
Here's a more complex example of a custom agent:
Pythonfrom agents import Agent, function_tool from ollama_client import OllamaClient from pydantic import BaseModel, Field import requests import json import re class WeatherInput(BaseModel): location: str = Field(..., description="City or location name") class WeatherOutput(BaseModel): temperature: float = Field(..., description="Current temperature in Celsius") conditions: str = Field(..., description="Weather conditions") humidity: float = Field(..., description="Humidity percentage") class ForecastInput(BaseModel): location: str = Field(..., description="City or location name") days: int = Field(3, description="Number of days to forecast") class ForecastOutput(BaseModel): forecast: list = Field(..., description="Daily forecast data") @function_tool def get_weather(location: str) -> dict: """Gets the current weather for a location.""" try: # Example implementation (would use actual weather API) response = requests.get(f"https://weather-api.example.com/current?q={location}") data = response.json() return { "temperature": data["temp_c"], "conditions": data["condition"]["text"], "humidity": data["humidity"] } except Exception as e: return {"error": str(e)} @function_tool def get_forecast(location: str, days: int = 3) -> dict: """Gets a weather forecast for a location.""" try: # Example implementation (would use actual weather API) response = requests.get(f"https://weather-api.example.com/forecast?q={location}&days={days}") data = response.json() forecast = [] for day in data["forecast"]["forecastday"]: forecast.append({ "date": day["date"], "max_temp": day["day"]["maxtemp_c"], "min_temp": day["day"]["mintemp_c"], "condition": day["day"]["condition"]["text"] }) return {"forecast": forecast} except Exception as e: return {"error": str(e)} def create_weather_agent(): """Creates a weather information agent.""" client = OllamaClient(model_name="mistral") agent = Agent( name="WeatherAgent", instructions=""" You are a Weather Assistant that provides accurate weather information. When a user asks about the weather: 1. Use get_weather to fetch current conditions 2. Use get_forecast for multi-day forecasts Always specify temperature units (Celsius) in your responses. For forecasts, present the information in a clear, day-by-day format. If a user doesn't specify a location, ask them for clarification. """, tools=[get_weather, get_forecast], model=client, ) return agent # Usage agent = create_weather_agent() response = agent.run("What's the weather like in London?") print(response.message)
Advanced Customization
Using Different Models
Ollama supports numerous open-source models. To use a different model:
- Pull the model:
Bashollama pull llama3
- Specify the model when creating the client:
Pythonclient = OllamaClient(model_name="llama3")
Model Recommendations
- mistral: Good balance of performance and speed
- llama3: High quality, larger context window
- mixtral: Strong multi-specialty model
- gemma: Efficient for simpler tasks
- phi3: Latest Microsoft model with strong capabilities
Custom System Prompts
The system prompt (instructions) is crucial for agent behavior:
Pythonagent = Agent( name="CustomAgent", instructions=""" You are a specialized assistant that helps with [specific domain]. Follow these guidelines: 1. Always begin by [specific action] 2. For complex queries, use [specific approach] 3. When uncertain, [specific strategy] When using tools: - Use [tool1] for [specific scenario] - Use [tool2] when [specific condition] Response format: - Start with a [specific element] - Include [specific component] - Format using [specific style] Additional instructions: [any other specific behavioral guidance] """, tools=tools, model=client, )
Caching Implementation
Implement caching to improve performance and reduce redundant model calls:
Pythonfrom caching_service import CachingService from ollama_client import OllamaClient # Create a caching layer cache = CachingService(cache_dir="./cache") # Create a cached client class CachedOllamaClient(OllamaClient): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.cache = cache def chat_completion(self, messages, **kwargs): cache_key = self._generate_cache_key(messages, kwargs) cached_result = self.cache.get(cache_key) if cached_result: return cached_result result = super().chat_completion(messages, **kwargs) self.cache.store(cache_key, result) return result def _generate_cache_key(self, messages, kwargs): # Create a deterministic key from messages and relevant kwargs key_components = [ self.model_name, str(messages), str({k: v for k, v in kwargs.items() if k in ["temperature", "max_tokens"]}) ] return hash("".join(key_components)) # Use the cached client client = CachedOllamaClient(model_name="mistral")
Custom Tool Categories
Organize tools into categories for more structured agents:
Pythonfrom agents import Agent, function_tool from ollama_client import OllamaClient # Document tools @function_tool(category="document") def fetch_document(url: str) -> dict: """Fetches a document from URL.""" # Implementation @function_tool(category="document") def analyze_document(text: str) -> dict: """Analyzes document content.""" # Implementation # Search tools @function_tool(category="search") def web_search(query: str) -> dict: """Searches the web for information.""" # Implementation @function_tool(category="search") def image_search(query: str) -> dict: """Searches for images.""" # Implementation # Create an agent with tool categories client = OllamaClient(model_name="mixtral") agent = Agent( name="ResearchAssistant", instructions=""" You are a Research Assistant with access to different tool categories: DOCUMENT TOOLS: - Use fetch_document to retrieve document content - Use analyze_document to extract insights from documents SEARCH TOOLS: - Use web_search to find information online - Use image_search to find relevant images Choose the appropriate tool category based on the user's request. """, tools=[fetch_document, analyze_document, web_search, image_search], model=client, )
Custom Response Formatting
Implement custom response formatting for specialized outputs:
Pythonclass CustomAgentResponse: def __init__(self, result): self.message = self._format_message(result.final_output) self.conversation_id = getattr(result, 'conversation_id', None) self.tool_calls = self._extract_tool_calls(result) self.formatted_output = self._generate_formatted_output() def _format_message(self, message): # Format the message (e.g., add markdown, structure sections) return message def _extract_tool_calls(self, result): # Extract tool call information tool_calls = [] # Extraction logic return tool_calls def _generate_formatted_output(self): # Create a custom formatted output (e.g., HTML, JSON) return { "message": self.message, "tools_used": [t.name for t in self.tool_calls], "formatted_at": datetime.now().isoformat() } def to_json(self): """Convert the response to JSON.""" return json.dumps(self.formatted_output) def to_html(self): """Convert the response to HTML.""" html = f"" return html{self.message}
" if self.tool_calls: html += "" for tool in self.tool_calls: html += f"
" html += "- {tool.name}
" html += "
Debugging and Troubleshooting
Enabling Debug Logging
Pythonimport logging import sys # Configure logging logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler("simulacra01.log"), logging.StreamHandler(sys.stdout) ] ) # Create module-specific loggers ollama_logger = logging.getLogger("ollama_client") agent_logger = logging.getLogger("agent") tools_logger = logging.getLogger("tools") # Set specific log levels if needed ollama_logger.setLevel(logging.DEBUG)
Common Issues and Solutions
Model Not Found
Problem: Error: model 'xyz' not found
Solution:
- Check available models:
ollama list - Pull the missing model:
ollama pull xyz - Verify model spelling in your code
Memory Issues
Problem: Out of memory errors when running large models
Solution:
- Use a smaller model (mistral instead of mixtral)
- Reduce batch size or context length
- Add system swap space
- Close other memory-intensive applications
API Compatibility
Problem: OpenAI SDK methods not working with Ollama
Solution:
- Check the adapter implementation for the specific method
- Add wrapper methods to OllamaClient class
- Consult Ollama API documentation for endpoint limitations
Tool Call Issues
Problem: Model not using tools or using them incorrectly
Solution:
- Simplify tool definitions and make their purpose more explicit
- Check that tool schemas are properly defined
- Add more specific instructions about tool usage in the agent prompt
- Try a more capable model (llama3 or mixtral)
Inspecting Raw Responses
To inspect raw model responses:
Pythonfrom ollama_client import OllamaClient client = OllamaClient(model_name="mistral") # Get raw response response = client.chat.completions.create( model="mistral", messages=[{"role": "user", "content": "What is 2+2?"}], temperature=0.7, ) # Print full response object import json print(json.dumps(response.model_dump(), indent=2))
Performance Optimization
Model Selection Strategies
Choose the right model for the task:
| Task Type | Recommended Model | Notes |
|---|---|---|
| Simple Q&A | gemma | Fastest, lower resource usage |
| General assistant | mistral | Good balance of quality/speed |
| Complex reasoning | llama3 | Higher quality, more resources |
| Specialized domains | mixtral | Multi-specialty, highest resources |
Prompt Optimization
Optimize prompts for better efficiency:
- Be Specific: Clear, concise instructions reduce token usage
- Provide Examples: Few-shot examples improve response quality
- Structured Output: Request specific formats to reduce parsing needs
- Limit Context: Only include relevant information
- Use Separators: Clearly delineate sections with markers
Chunking Strategies
For large documents, implement effective chunking:
Pythondef chunk_document(text, chunk_size=2000, overlap=200): """Split document into overlapping chunks.""" chunks = [] # Simple character-based chunking for i in range(0, len(text), chunk_size - overlap): chunk = text[i:i + chunk_size] chunks.append(chunk) return chunks def chunk_document_by_section(text, chunk_size=2000): """Split document by natural section boundaries.""" # Find section boundaries (e.g., headers, paragraph breaks) sections = re.split(r'(?:\n\s*){2,}|(?:#{1,6}\s+[^\n]+\n)', text) chunks = [] current_chunk = "" for section in sections: # If adding this section would exceed chunk size, save current chunk if len(current_chunk) + len(section) > chunk_size and current_chunk: chunks.append(current_chunk) current_chunk = section else: current_chunk += section # Add the final chunk if not empty if current_chunk: chunks.append(current_chunk) return chunks
Caching Optimization
Implement multi-level caching for better performance:
Pythonclass MultiLevelCache: def __init__(self): self.memory_cache = {} # Fast, in-memory cache self.disk_cache = DiskCache("./cache") # Persistent disk cache def get(self, key): # First check memory cache (fastest) if key in self.memory_cache: return self.memory_cache[key] # Then check disk cache disk_result = self.disk_cache.get(key) if disk_result: # Also store in memory for future fast access self.memory_cache[key] = disk_result return disk_result return None def store(self, key, value): # Store in both caches self.memory_cache[key] = value self.disk_cache.store(key, value) def clear_memory_cache(self): """Clear only memory cache (e.g., to free memory).""" self.memory_cache = {}
Parallel Processing
Implement parallel processing for multiple operations:
Pythonimport concurrent.futures def process_document_parallel(document, queries): """Process multiple queries against a document in parallel.""" results = {} with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Create a future for each query future_to_query = { executor.submit(search_document, document, query): query for query in queries } # Process results as they complete for future in concurrent.futures.as_completed(future_to_query): query = future_to_query[future] try: results[query] = future.result() except Exception as e: results[query] = {"error": str(e)} return results
Contributing and Development
Setting Up Development Environment
Bash# Clone the repository git clone https://github.com/yourusername/simulacra01.git cd simulacra01 # Create a virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install in development mode with dev dependencies pip install -e ".[dev]" # Install pre-commit hooks pre-commit install
Running Tests
Bash# Run all tests pytest # Run specific test file pytest tests/test_ollama_client.py # Run with coverage coverage run -m pytest coverage report coverage html # Generate HTML report
Code Style and Linting
Bash# Check code style ruff check . # Auto-fix issues ruff check --fix . # Run type checking mypy .
Documentation Generation
Bash# Generate API documentation python scripts/generate_docs.py # Build documentation website mkdocs build # Serve documentation locally mkdocs serve
Branch Strategy
main: Stable release branchdevelop: Main development branchfeature/*: For new featuresbugfix/*: For bug fixesrelease/*: For release preparation
Commit Guidelines
Use conventional commits format:
feat:New featurefix:Bug fixdocs:Documentation changesstyle:Code style changesrefactor:Code refactoringtest:Adding or updating testschore:Maintenance tasks
Pull Request Process
- Create a new branch from
develop - Make your changes
- Run tests and linting
- Submit a pull request to
develop - Ensure CI passes
- Request review from maintainers
- Address review feedback
This comprehensive guide covers all aspects of using, customizing, and extending Simulacra01. For additional information, refer to the specific documentation files in the docs/ directory.