Simulacra01: Complete Guide to Building Local AI Agents with OpenAI Agents SDK and Ollama Integration
A comprehensive guide to Simulacra01, a framework that integrates the OpenAI Agents SDK with Ollama for building locally-hosted AI agents with document analysis, custom agent creation, and advanced customization capabilities.
Daniel Kliewer
Author, Sovereign AI


Comprehensive Guide to Simulacra01
This guide provides detailed documentation on how to use, customize, and extend Simulacra01, a framework that integrates the OpenAI Agents SDK with Ollama for local AI agent capabilities.
Table of Contents
- Introduction
- Understanding the Architecture
- Installation & Setup
- Using Document Analysis Agent
- Working with the Command-Line Interface
- Creating Custom Agents
- Advanced Customization
- Debugging and Troubleshooting
- Performance Optimization
- Contributing and Development
Introduction
Simulacra01 is a powerful framework that brings together the structured agent capabilities of OpenAI's Agents SDK with the privacy and cost benefits of local LLM inference through Ollama. This integration enables you to build sophisticated AI agents that run entirely on your local infrastructure.
Key Benefits
- Complete Data Privacy: All processing happens locally, with no data sent to external services
- Cost Efficiency: No per-token API costs associated with cloud-based LLM services
- Customizability: Full control over model selection, fine-tuning, and behavior
- Network Independence: Agents function without requiring internet access
- Reduced Latency: Eliminate network roundtrips for faster responses
Core Components
- OpenAI Agents SDK: Provides the structured framework for building AI agents
- Ollama: Enables local running of various open-source LLMs
- Adapter Layer: Connects the two technologies seamlessly
- Specialized Agents: Pre-built agents for document analysis and other tasks
- Command-Line Interface: Interactive way to engage with agents
Understanding the Architecture
Simulacra01 employs a layered architecture designed for flexibility and extensibility:
Ollama Layer
The base layer provides LLM inference capabilities:
- Handles model loading and management
- Processes raw prompts into completions
- Manages system resources for inference
- Provides API endpoints that mimic OpenAI's structure
Adapter Layer
The bridge between Ollama and the OpenAI Agents SDK:
OllamaClient: Routes requests to Ollama's API endpointsAgentAdapter: Makes OpenAI's Agent class compatible with the Ollama backendResponseFormatter: Ensures responses match expected formatsToolCallProcessor: Handles function/tool calls with local models
Agents SDK Layer
Provides the agent framework and abstractions:
- Agent lifecycle management
- Tool definition and integration
- Conversation handling
- Response processing
Application Layer
Implements specialized agents and interfaces:
- Document Analysis Agent
- Command-Line Interface
- Document Memory system
- Other specialized agent types
Installation & Setup
System Requirements
- Python 3.9 or higher
- 8GB+ RAM recommended (model dependent)
- 2GB+ free disk space for model storage
Step 1: Install Ollama
For macOS and Linux:
bash1curl -fsSL https://ollama.ai/install.sh | sh
For Windows, download from Ollama's website.
Verify installation:
bash1ollama --version
Step 2: Download Required Models
bash1# Pull the Mistral model (recommended starting model)2ollama pull mistral34# Optional: Pull additional models5ollama pull llama36ollama pull mixtral
Verify model installation:
bash1ollama list
Step 3: Clone and Install Simulacra01
bash1git clone https://github.com/kliewerdaniel/simulacra01.git2cd simulacra013pip install -e .
Step 4: Install Dependencies
bash1pip install -r requirements.txt
Step 5: Verify Installation
Run the basic test script:
bash1python -c "from ollama_client import OllamaClient; client = OllamaClient(); response = client.chat.completions.create(model='mistral', messages=[{'role': 'user', 'content': 'Hello, world!'}]); print(response.choices[0].message.content)"
You should see a response from the model.
Using Document Analysis Agent
The Document Analysis Agent is a powerful tool for extracting information from documents, answering questions about content, and managing a document repository.
Basic Usage
Run the document agent:
bash1python main.py
This will start an interactive session with the agent.
Available Commands
exit: Exit the agenthelp: Show help informationlist: List documents in memory
Example Interactions
Analyze a webpage:
You: Please analyze the article at https://en.wikipedia.org/wiki/Artificial_intelligence and tell me when AI was first developed.
Extract specific information:
You: Extract all the dates mentioned in the last document.
Search for content:
You: Find information about neural networks in the document.
Tool Functionality
The Document Analysis Agent includes several specialized tools:
fetch_document
Retrieves document content from a URL:
python1fetch_document(url="https://example.com/article")
This tool:
- Checks if the document is already in memory
- If not, fetches it from the URL
- Stores it in document memory for future use
- Returns the document content
extract_info
Extracts specific types of information from text:
python1extract_info(text="document content", info_type="dates")
Common info types:
dates: Extracts dates and timestampsnames: Extracts person namesorganizations: Extracts organization nameskey points: Extracts main ideas or argumentsstatistics: Extracts numerical data and statistics
search_document
Searches document content for relevant information:
python1search_document(text="document content", query="neural networks")
This uses semantic search to find the most relevant paragraphs for the query.
Document Memory
The Document Memory system provides persistent storage for documents:
python1from document_memory import DocumentMemory23# Initialize memory4memory = DocumentMemory()56# Store a document7doc_id = memory.store_document(8 url="https://example.com/article",9 content="Document text goes here...",10 metadata={"author": "John Doe", "date": "2025-03-13"}11)1213# Retrieve a document14doc = memory.get_document(doc_id)15print(doc["content"])1617# List all documents18docs = memory.list_documents()19for doc in docs:20 print(f"URL: {doc['url']}")
Document memory is stored on disk and persists between sessions.
Working with the Command-Line Interface
The Simulacra01 CLI provides a comprehensive interface for interacting with various agent types.
Starting the CLI
bash1# Start with interactive menu2python cli.py34# Start directly with a specific agent5python cli.py chat --agent document6python cli.py chat --agent research
Global Commands
These commands work across all agent types:
exit: End the current sessionhelp: Show available commandsclear: Clear the conversation historysave [filename]: Save the current conversationload <filename>: Load a saved conversationlist: List saved conversationstools: List available tools
Agent-Specific Commands
Document Agent
list docs: List stored documentsanalyze <url>: Analyze a document at URL
Research Agent
search <topic>: Research a topicsynthesize: Summarize research findingssave research <filename>: Save research data
Task Agent
add task <title>: Add a new tasklist tasks: Show all tasksupdate task <id>: Update task status
Configuration
Configure the CLI using:
bash1python cli.py config
This allows you to customize:
- OpenAI and Ollama settings
- Model preferences
- Agent-specific parameters
- System prompts
Configuration is stored in ~/.simulacra/config.json.
Creating Custom Agents
Simulacra01 makes it easy to create custom agents tailored to specific use cases.
Basic Agent Creation
python1from agents import Agent, function_tool2from ollama_client import OllamaClient3from pydantic import BaseModel, Field45# Define the client6client = OllamaClient(model_name="mistral")78# Define tool schemas9class AddInput(BaseModel):10 a: int = Field(..., description="First number")11 b: int = Field(..., description="Second number")1213class AddOutput(BaseModel):14 result: int = Field(..., description="Sum of the two numbers")1516# Define the tool function17@function_tool18def add(a: int, b: int) -> dict:19 """Adds two numbers together."""20 return {"result": a + b}2122# Create the agent23agent = Agent(24 name="MathAgent",25 instructions="You are a math assistant that helps users with calculations.",26 tools=[add],27 model=client,28)2930# Use the agent31response = agent.run("What is 5 + 7?")32print(response.message)
Tool Development Best Practices
- Clear Function Signatures: Make parameter names intuitive
- Comprehensive Docstrings: Explain what the tool does
- Error Handling: Gracefully handle exceptions
- Type Annotations: Use proper type hints
- Schema Definitions: Use Pydantic for input/output validation
Complex Agent Example
Here's a more complex example of a custom agent:
python1from agents import Agent, function_tool2from ollama_client import OllamaClient3from pydantic import BaseModel, Field4import requests5import json6import re78class WeatherInput(BaseModel):9 location: str = Field(..., description="City or location name")1011class WeatherOutput(BaseModel):12 temperature: float = Field(..., description="Current temperature in Celsius")13 conditions: str = Field(..., description="Weather conditions")14 humidity: float = Field(..., description="Humidity percentage")1516class ForecastInput(BaseModel):17 location: str = Field(..., description="City or location name")18 days: int = Field(3, description="Number of days to forecast")1920class ForecastOutput(BaseModel):21 forecast: list = Field(..., description="Daily forecast data")2223@function_tool24def get_weather(location: str) -> dict:25 """Gets the current weather for a location."""26 try:27 # Example implementation (would use actual weather API)28 response = requests.get(f"https://weather-api.example.com/current?q={location}")29 data = response.json()30 return {31 "temperature": data["temp_c"],32 "conditions": data["condition"]["text"],33 "humidity": data["humidity"]34 }35 except Exception as e:36 return {"error": str(e)}3738@function_tool39def get_forecast(location: str, days: int = 3) -> dict:40 """Gets a weather forecast for a location."""41 try:42 # Example implementation (would use actual weather API)43 response = requests.get(f"https://weather-api.example.com/forecast?q={location}&days={days}")44 data = response.json()45 forecast = []46 for day in data["forecast"]["forecastday"]:47 forecast.append({48 "date": day["date"],49 "max_temp": day["day"]["maxtemp_c"],50 "min_temp": day["day"]["mintemp_c"],51 "condition": day["day"]["condition"]["text"]52 })53 return {"forecast": forecast}54 except Exception as e:55 return {"error": str(e)}5657def create_weather_agent():58 """Creates a weather information agent."""59 client = OllamaClient(model_name="mistral")6061 agent = Agent(62 name="WeatherAgent",63 instructions="""64 You are a Weather Assistant that provides accurate weather information.6566 When a user asks about the weather:67 1. Use get_weather to fetch current conditions68 2. Use get_forecast for multi-day forecasts6970 Always specify temperature units (Celsius) in your responses.71 For forecasts, present the information in a clear, day-by-day format.72 If a user doesn't specify a location, ask them for clarification.73 """,74 tools=[get_weather, get_forecast],75 model=client,76 )7778 return agent7980# Usage81agent = create_weather_agent()82response = agent.run("What's the weather like in London?")83print(response.message)
Advanced Customization
Using Different Models
Ollama supports numerous open-source models. To use a different model:
- Pull the model:
bash1ollama pull llama3
- Specify the model when creating the client:
python1client = OllamaClient(model_name="llama3")
Model Recommendations
- mistral: Good balance of performance and speed
- llama3: High quality, larger context window
- mixtral: Strong multi-specialty model
- gemma: Efficient for simpler tasks
- phi3: Latest Microsoft model with strong capabilities
Custom System Prompts
The system prompt (instructions) is crucial for agent behavior:
python1agent = Agent(2 name="CustomAgent",3 instructions="""4 You are a specialized assistant that helps with [specific domain].56 Follow these guidelines:7 1. Always begin by [specific action]8 2. For complex queries, use [specific approach]9 3. When uncertain, [specific strategy]1011 When using tools:12 - Use [tool1] for [specific scenario]13 - Use [tool2] when [specific condition]1415 Response format:16 - Start with a [specific element]17 - Include [specific component]18 - Format using [specific style]1920 Additional instructions:21 [any other specific behavioral guidance]22 """,23 tools=tools,24 model=client,25)
Caching Implementation
Implement caching to improve performance and reduce redundant model calls:
python1from caching_service import CachingService2from ollama_client import OllamaClient34# Create a caching layer5cache = CachingService(cache_dir="./cache")67# Create a cached client8class CachedOllamaClient(OllamaClient):9 def __init__(self, *args, **kwargs):10 super().__init__(*args, **kwargs)11 self.cache = cache1213 def chat_completion(self, messages, **kwargs):14 cache_key = self._generate_cache_key(messages, kwargs)15 cached_result = self.cache.get(cache_key)1617 if cached_result:18 return cached_result1920 result = super().chat_completion(messages, **kwargs)21 self.cache.store(cache_key, result)2223 return result2425 def _generate_cache_key(self, messages, kwargs):26 # Create a deterministic key from messages and relevant kwargs27 key_components = [28 self.model_name,29 str(messages),30 str({k: v for k, v in kwargs.items() if k in ["temperature", "max_tokens"]})31 ]32 return hash("".join(key_components))3334# Use the cached client35client = CachedOllamaClient(model_name="mistral")
Custom Tool Categories
Organize tools into categories for more structured agents:
python1from agents import Agent, function_tool2from ollama_client import OllamaClient34# Document tools5@function_tool(category="document")6def fetch_document(url: str) -> dict:7 """Fetches a document from URL."""8 # Implementation910@function_tool(category="document")11def analyze_document(text: str) -> dict:12 """Analyzes document content."""13 # Implementation1415# Search tools16@function_tool(category="search")17def web_search(query: str) -> dict:18 """Searches the web for information."""19 # Implementation2021@function_tool(category="search")22def image_search(query: str) -> dict:23 """Searches for images."""24 # Implementation2526# Create an agent with tool categories27client = OllamaClient(model_name="mixtral")2829agent = Agent(30 name="ResearchAssistant",31 instructions="""32 You are a Research Assistant with access to different tool categories:3334 DOCUMENT TOOLS:35 - Use fetch_document to retrieve document content36 - Use analyze_document to extract insights from documents3738 SEARCH TOOLS:39 - Use web_search to find information online40 - Use image_search to find relevant images4142 Choose the appropriate tool category based on the user's request.43 """,44 tools=[fetch_document, analyze_document, web_search, image_search],45 model=client,46)
Custom Response Formatting
Implement custom response formatting for specialized outputs:
python1class CustomAgentResponse:2 def __init__(self, result):3 self.message = self._format_message(result.final_output)4 self.conversation_id = getattr(result, 'conversation_id', None)5 self.tool_calls = self._extract_tool_calls(result)6 self.formatted_output = self._generate_formatted_output()78 def _format_message(self, message):9 # Format the message (e.g., add markdown, structure sections)10 return message1112 def _extract_tool_calls(self, result):13 # Extract tool call information14 tool_calls = []15 # Extraction logic16 return tool_calls1718 def _generate_formatted_output(self):19 # Create a custom formatted output (e.g., HTML, JSON)20 return {21 "message": self.message,22 "tools_used": [t.name for t in self.tool_calls],23 "formatted_at": datetime.now().isoformat()24 }2526 def to_json(self):27 """Convert the response to JSON."""28 return json.dumps(self.formatted_output)2930 def to_html(self):31 """Convert the response to HTML."""32 html = f"<div class='agent-response'><p>{self.message}</p>"33 if self.tool_calls:34 html += "<ul class='tools-used'>"35 for tool in self.tool_calls:36 html += f"<li>{tool.name}</li>"37 html += "</ul>"38 html += "</div>"39 return html
Debugging and Troubleshooting
Enabling Debug Logging
python1import logging2import sys34# Configure logging5logging.basicConfig(6 level=logging.DEBUG,7 format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',8 handlers=[9 logging.FileHandler("simulacra01.log"),10 logging.StreamHandler(sys.stdout)11 ]12)1314# Create module-specific loggers15ollama_logger = logging.getLogger("ollama_client")16agent_logger = logging.getLogger("agent")17tools_logger = logging.getLogger("tools")1819# Set specific log levels if needed20ollama_logger.setLevel(logging.DEBUG)
Common Issues and Solutions
Model Not Found
Problem: Error: model 'xyz' not found
Solution:
- Check available models:
ollama list - Pull the missing model:
ollama pull xyz - Verify model spelling in your code
Memory Issues
Problem: Out of memory errors when running large models
Solution:
- Use a smaller model (mistral instead of mixtral)
- Reduce batch size or context length
- Add system swap space
- Close other memory-intensive applications
API Compatibility
Problem: OpenAI SDK methods not working with Ollama
Solution:
- Check the adapter implementation for the specific method
- Add wrapper methods to OllamaClient class
- Consult Ollama API documentation for endpoint limitations
Tool Call Issues
Problem: Model not using tools or using them incorrectly
Solution:
- Simplify tool definitions and make their purpose more explicit
- Check that tool schemas are properly defined
- Add more specific instructions about tool usage in the agent prompt
- Try a more capable model (llama3 or mixtral)
Inspecting Raw Responses
To inspect raw model responses:
python1from ollama_client import OllamaClient23client = OllamaClient(model_name="mistral")45# Get raw response6response = client.chat.completions.create(7 model="mistral",8 messages=[{"role": "user", "content": "What is 2+2?"}],9 temperature=0.7,10)1112# Print full response object13import json14print(json.dumps(response.model_dump(), indent=2))
Performance Optimization
Model Selection Strategies
Choose the right model for the task:
| Task Type | Recommended Model | Notes | |-----------|-------------------|-------| | Simple Q&A | gemma | Fastest, lower resource usage | | General assistant | mistral | Good balance of quality/speed | | Complex reasoning | llama3 | Higher quality, more resources | | Specialized domains | mixtral | Multi-specialty, highest resources |
Prompt Optimization
Optimize prompts for better efficiency:
- Be Specific: Clear, concise instructions reduce token usage
- Provide Examples: Few-shot examples improve response quality
- Structured Output: Request specific formats to reduce parsing needs
- Limit Context: Only include relevant information
- Use Separators: Clearly delineate sections with markers
Chunking Strategies
For large documents, implement effective chunking:
python1def chunk_document(text, chunk_size=2000, overlap=200):2 """Split document into overlapping chunks."""3 chunks = []45 # Simple character-based chunking6 for i in range(0, len(text), chunk_size - overlap):7 chunk = text[i:i + chunk_size]8 chunks.append(chunk)910 return chunks1112def chunk_document_by_section(text, chunk_size=2000):13 """Split document by natural section boundaries."""14 # Find section boundaries (e.g., headers, paragraph breaks)15 sections = re.split(r'(?:\n\s*){2,}|(?:#{1,6}\s+[^\n]+\n)', text)1617 chunks = []18 current_chunk = ""1920 for section in sections:21 # If adding this section would exceed chunk size, save current chunk22 if len(current_chunk) + len(section) > chunk_size and current_chunk:23 chunks.append(current_chunk)24 current_chunk = section25 else:26 current_chunk += section2728 # Add the final chunk if not empty29 if current_chunk:30 chunks.append(current_chunk)3132 return chunks
Caching Optimization
Implement multi-level caching for better performance:
python1class MultiLevelCache:2 def __init__(self):3 self.memory_cache = {} # Fast, in-memory cache4 self.disk_cache = DiskCache("./cache") # Persistent disk cache56 def get(self, key):7 # First check memory cache (fastest)8 if key in self.memory_cache:9 return self.memory_cache[key]1011 # Then check disk cache12 disk_result = self.disk_cache.get(key)13 if disk_result:14 # Also store in memory for future fast access15 self.memory_cache[key] = disk_result16 return disk_result1718 return None1920 def store(self, key, value):21 # Store in both caches22 self.memory_cache[key] = value23 self.disk_cache.store(key, value)2425 def clear_memory_cache(self):26 """Clear only memory cache (e.g., to free memory)."""27 self.memory_cache = {}
Parallel Processing
Implement parallel processing for multiple operations:
python1import concurrent.futures23def process_document_parallel(document, queries):4 """Process multiple queries against a document in parallel."""5 results = {}67 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:8 # Create a future for each query9 future_to_query = {10 executor.submit(search_document, document, query): query11 for query in queries12 }1314 # Process results as they complete15 for future in concurrent.futures.as_completed(future_to_query):16 query = future_to_query[future]17 try:18 results[query] = future.result()19 except Exception as e:20 results[query] = {"error": str(e)}2122 return results
Contributing and Development
Setting Up Development Environment
bash1# Clone the repository2git clone https://github.com/yourusername/simulacra01.git3cd simulacra0145# Create a virtual environment6python -m venv venv7source venv/bin/activate # On Windows: venv\Scripts\activate89# Install in development mode with dev dependencies10pip install -e ".[dev]"1112# Install pre-commit hooks13pre-commit install
Running Tests
bash1# Run all tests2pytest34# Run specific test file5pytest tests/test_ollama_client.py67# Run with coverage8coverage run -m pytest9coverage report10coverage html # Generate HTML report
Code Style and Linting
bash1# Check code style2ruff check .34# Auto-fix issues5ruff check --fix .67# Run type checking8mypy .
Documentation Generation
bash1# Generate API documentation2python scripts/generate_docs.py34# Build documentation website5mkdocs build67# Serve documentation locally8mkdocs serve
Branch Strategy
main: Stable release branchdevelop: Main development branchfeature/*: For new featuresbugfix/*: For bug fixesrelease/*: For release preparation
Commit Guidelines
Use conventional commits format:
feat:New featurefix:Bug fixdocs:Documentation changesstyle:Code style changesrefactor:Code refactoringtest:Adding or updating testschore:Maintenance tasks
Pull Request Process
- Create a new branch from
develop - Make your changes
- Run tests and linting
- Submit a pull request to
develop - Ensure CI passes
- Request review from maintainers
- Address review feedback
This comprehensive guide covers all aspects of using, customizing, and extending Simulacra01. For additional information, refer to the specific documentation files in the docs/ directory.

Sovereign AI: Building Local-First Intelligent Systems
by Daniel Kliewer · Paperback · 72 pages
The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.