·16 min

Simulacra01: Complete Guide to Building Local AI Agents with OpenAI Agents SDK and Ollama Integration

A comprehensive guide to Simulacra01, a framework that integrates the OpenAI Agents SDK with Ollama for building locally-hosted AI agents with document analysis, custom agent creation, and advanced customization capabilities.

DK

Daniel Kliewer

Author, Sovereign AI

Simulacra01OpenAI Agents SDKOllamaLocal AI AgentsDocument AnalysisCustom AgentsAI DevelopmentAgent FrameworksLocal LLMsAI Integration
Sovereign AI book cover

From the Book

This is from Sovereign AI: Building Local-First Intelligent Systems.

Get the Book — $88
Simulacra01: Complete Guide to Building Local AI Agents with OpenAI Agents SDK and Ollama Integration

Image

Comprehensive Guide to Simulacra01

This guide provides detailed documentation on how to use, customize, and extend Simulacra01, a framework that integrates the OpenAI Agents SDK with Ollama for local AI agent capabilities.

Table of Contents

  1. Introduction
  2. Understanding the Architecture
  3. Installation & Setup
  4. Using Document Analysis Agent
  5. Working with the Command-Line Interface
  6. Creating Custom Agents
  7. Advanced Customization
  8. Debugging and Troubleshooting
  9. Performance Optimization
  10. Contributing and Development

Introduction

Simulacra01 is a powerful framework that brings together the structured agent capabilities of OpenAI's Agents SDK with the privacy and cost benefits of local LLM inference through Ollama. This integration enables you to build sophisticated AI agents that run entirely on your local infrastructure.

Key Benefits

  • Complete Data Privacy: All processing happens locally, with no data sent to external services
  • Cost Efficiency: No per-token API costs associated with cloud-based LLM services
  • Customizability: Full control over model selection, fine-tuning, and behavior
  • Network Independence: Agents function without requiring internet access
  • Reduced Latency: Eliminate network roundtrips for faster responses

Core Components

  • OpenAI Agents SDK: Provides the structured framework for building AI agents
  • Ollama: Enables local running of various open-source LLMs
  • Adapter Layer: Connects the two technologies seamlessly
  • Specialized Agents: Pre-built agents for document analysis and other tasks
  • Command-Line Interface: Interactive way to engage with agents

Understanding the Architecture

Simulacra01 employs a layered architecture designed for flexibility and extensibility:

Ollama Layer

The base layer provides LLM inference capabilities:

  • Handles model loading and management
  • Processes raw prompts into completions
  • Manages system resources for inference
  • Provides API endpoints that mimic OpenAI's structure

Adapter Layer

The bridge between Ollama and the OpenAI Agents SDK:

  • OllamaClient: Routes requests to Ollama's API endpoints
  • AgentAdapter: Makes OpenAI's Agent class compatible with the Ollama backend
  • ResponseFormatter: Ensures responses match expected formats
  • ToolCallProcessor: Handles function/tool calls with local models

Agents SDK Layer

Provides the agent framework and abstractions:

  • Agent lifecycle management
  • Tool definition and integration
  • Conversation handling
  • Response processing

Application Layer

Implements specialized agents and interfaces:

  • Document Analysis Agent
  • Command-Line Interface
  • Document Memory system
  • Other specialized agent types

Installation & Setup

System Requirements

  • Python 3.9 or higher
  • 8GB+ RAM recommended (model dependent)
  • 2GB+ free disk space for model storage

Step 1: Install Ollama

For macOS and Linux:

bash
1curl -fsSL https://ollama.ai/install.sh | sh

For Windows, download from Ollama's website.

Verify installation:

bash
1ollama --version

Step 2: Download Required Models

bash
1# Pull the Mistral model (recommended starting model)
2ollama pull mistral
3
4# Optional: Pull additional models
5ollama pull llama3
6ollama pull mixtral

Verify model installation:

bash
1ollama list

Step 3: Clone and Install Simulacra01

bash
1git clone https://github.com/kliewerdaniel/simulacra01.git
2cd simulacra01
3pip install -e .

Step 4: Install Dependencies

bash
1pip install -r requirements.txt

Step 5: Verify Installation

Run the basic test script:

bash
1python -c "from ollama_client import OllamaClient; client = OllamaClient(); response = client.chat.completions.create(model='mistral', messages=[{'role': 'user', 'content': 'Hello, world!'}]); print(response.choices[0].message.content)"

You should see a response from the model.

Using Document Analysis Agent

The Document Analysis Agent is a powerful tool for extracting information from documents, answering questions about content, and managing a document repository.

Basic Usage

Run the document agent:

bash
1python main.py

This will start an interactive session with the agent.

Available Commands

  • exit: Exit the agent
  • help: Show help information
  • list: List documents in memory

Example Interactions

Analyze a webpage:

You: Please analyze the article at https://en.wikipedia.org/wiki/Artificial_intelligence and tell me when AI was first developed.

Extract specific information:

You: Extract all the dates mentioned in the last document.

Search for content:

You: Find information about neural networks in the document.

Tool Functionality

The Document Analysis Agent includes several specialized tools:

fetch_document

Retrieves document content from a URL:

python
1fetch_document(url="https://example.com/article")

This tool:

  • Checks if the document is already in memory
  • If not, fetches it from the URL
  • Stores it in document memory for future use
  • Returns the document content

extract_info

Extracts specific types of information from text:

python
1extract_info(text="document content", info_type="dates")

Common info types:

  • dates: Extracts dates and timestamps
  • names: Extracts person names
  • organizations: Extracts organization names
  • key points: Extracts main ideas or arguments
  • statistics: Extracts numerical data and statistics

search_document

Searches document content for relevant information:

python
1search_document(text="document content", query="neural networks")

This uses semantic search to find the most relevant paragraphs for the query.

Document Memory

The Document Memory system provides persistent storage for documents:

python
1from document_memory import DocumentMemory
2
3# Initialize memory
4memory = DocumentMemory()
5
6# Store a document
7doc_id = memory.store_document(
8 url="https://example.com/article",
9 content="Document text goes here...",
10 metadata={"author": "John Doe", "date": "2025-03-13"}
11)
12
13# Retrieve a document
14doc = memory.get_document(doc_id)
15print(doc["content"])
16
17# List all documents
18docs = memory.list_documents()
19for doc in docs:
20 print(f"URL: {doc['url']}")

Document memory is stored on disk and persists between sessions.

Working with the Command-Line Interface

The Simulacra01 CLI provides a comprehensive interface for interacting with various agent types.

Starting the CLI

bash
1# Start with interactive menu
2python cli.py
3
4# Start directly with a specific agent
5python cli.py chat --agent document
6python cli.py chat --agent research

Global Commands

These commands work across all agent types:

  • exit: End the current session
  • help: Show available commands
  • clear: Clear the conversation history
  • save [filename]: Save the current conversation
  • load <filename>: Load a saved conversation
  • list: List saved conversations
  • tools: List available tools

Agent-Specific Commands

Document Agent

  • list docs: List stored documents
  • analyze <url>: Analyze a document at URL

Research Agent

  • search <topic>: Research a topic
  • synthesize: Summarize research findings
  • save research <filename>: Save research data

Task Agent

  • add task <title>: Add a new task
  • list tasks: Show all tasks
  • update task <id>: Update task status

Configuration

Configure the CLI using:

bash
1python cli.py config

This allows you to customize:

  • OpenAI and Ollama settings
  • Model preferences
  • Agent-specific parameters
  • System prompts

Configuration is stored in ~/.simulacra/config.json.

Creating Custom Agents

Simulacra01 makes it easy to create custom agents tailored to specific use cases.

Basic Agent Creation

python
1from agents import Agent, function_tool
2from ollama_client import OllamaClient
3from pydantic import BaseModel, Field
4
5# Define the client
6client = OllamaClient(model_name="mistral")
7
8# Define tool schemas
9class AddInput(BaseModel):
10 a: int = Field(..., description="First number")
11 b: int = Field(..., description="Second number")
12
13class AddOutput(BaseModel):
14 result: int = Field(..., description="Sum of the two numbers")
15
16# Define the tool function
17@function_tool
18def add(a: int, b: int) -> dict:
19 """Adds two numbers together."""
20 return {"result": a + b}
21
22# Create the agent
23agent = Agent(
24 name="MathAgent",
25 instructions="You are a math assistant that helps users with calculations.",
26 tools=[add],
27 model=client,
28)
29
30# Use the agent
31response = agent.run("What is 5 + 7?")
32print(response.message)

Tool Development Best Practices

  1. Clear Function Signatures: Make parameter names intuitive
  2. Comprehensive Docstrings: Explain what the tool does
  3. Error Handling: Gracefully handle exceptions
  4. Type Annotations: Use proper type hints
  5. Schema Definitions: Use Pydantic for input/output validation

Complex Agent Example

Here's a more complex example of a custom agent:

python
1from agents import Agent, function_tool
2from ollama_client import OllamaClient
3from pydantic import BaseModel, Field
4import requests
5import json
6import re
7
8class WeatherInput(BaseModel):
9 location: str = Field(..., description="City or location name")
10
11class WeatherOutput(BaseModel):
12 temperature: float = Field(..., description="Current temperature in Celsius")
13 conditions: str = Field(..., description="Weather conditions")
14 humidity: float = Field(..., description="Humidity percentage")
15
16class ForecastInput(BaseModel):
17 location: str = Field(..., description="City or location name")
18 days: int = Field(3, description="Number of days to forecast")
19
20class ForecastOutput(BaseModel):
21 forecast: list = Field(..., description="Daily forecast data")
22
23@function_tool
24def get_weather(location: str) -> dict:
25 """Gets the current weather for a location."""
26 try:
27 # Example implementation (would use actual weather API)
28 response = requests.get(f"https://weather-api.example.com/current?q={location}")
29 data = response.json()
30 return {
31 "temperature": data["temp_c"],
32 "conditions": data["condition"]["text"],
33 "humidity": data["humidity"]
34 }
35 except Exception as e:
36 return {"error": str(e)}
37
38@function_tool
39def get_forecast(location: str, days: int = 3) -> dict:
40 """Gets a weather forecast for a location."""
41 try:
42 # Example implementation (would use actual weather API)
43 response = requests.get(f"https://weather-api.example.com/forecast?q={location}&days={days}")
44 data = response.json()
45 forecast = []
46 for day in data["forecast"]["forecastday"]:
47 forecast.append({
48 "date": day["date"],
49 "max_temp": day["day"]["maxtemp_c"],
50 "min_temp": day["day"]["mintemp_c"],
51 "condition": day["day"]["condition"]["text"]
52 })
53 return {"forecast": forecast}
54 except Exception as e:
55 return {"error": str(e)}
56
57def create_weather_agent():
58 """Creates a weather information agent."""
59 client = OllamaClient(model_name="mistral")
60
61 agent = Agent(
62 name="WeatherAgent",
63 instructions="""
64 You are a Weather Assistant that provides accurate weather information.
65
66 When a user asks about the weather:
67 1. Use get_weather to fetch current conditions
68 2. Use get_forecast for multi-day forecasts
69
70 Always specify temperature units (Celsius) in your responses.
71 For forecasts, present the information in a clear, day-by-day format.
72 If a user doesn't specify a location, ask them for clarification.
73 """,
74 tools=[get_weather, get_forecast],
75 model=client,
76 )
77
78 return agent
79
80# Usage
81agent = create_weather_agent()
82response = agent.run("What's the weather like in London?")
83print(response.message)

Advanced Customization

Using Different Models

Ollama supports numerous open-source models. To use a different model:

  1. Pull the model:
bash
1ollama pull llama3
  1. Specify the model when creating the client:
python
1client = OllamaClient(model_name="llama3")

Model Recommendations

  • mistral: Good balance of performance and speed
  • llama3: High quality, larger context window
  • mixtral: Strong multi-specialty model
  • gemma: Efficient for simpler tasks
  • phi3: Latest Microsoft model with strong capabilities

Custom System Prompts

The system prompt (instructions) is crucial for agent behavior:

python
1agent = Agent(
2 name="CustomAgent",
3 instructions="""
4 You are a specialized assistant that helps with [specific domain].
5
6 Follow these guidelines:
7 1. Always begin by [specific action]
8 2. For complex queries, use [specific approach]
9 3. When uncertain, [specific strategy]
10
11 When using tools:
12 - Use [tool1] for [specific scenario]
13 - Use [tool2] when [specific condition]
14
15 Response format:
16 - Start with a [specific element]
17 - Include [specific component]
18 - Format using [specific style]
19
20 Additional instructions:
21 [any other specific behavioral guidance]
22 """,
23 tools=tools,
24 model=client,
25)

Caching Implementation

Implement caching to improve performance and reduce redundant model calls:

python
1from caching_service import CachingService
2from ollama_client import OllamaClient
3
4# Create a caching layer
5cache = CachingService(cache_dir="./cache")
6
7# Create a cached client
8class CachedOllamaClient(OllamaClient):
9 def __init__(self, *args, **kwargs):
10 super().__init__(*args, **kwargs)
11 self.cache = cache
12
13 def chat_completion(self, messages, **kwargs):
14 cache_key = self._generate_cache_key(messages, kwargs)
15 cached_result = self.cache.get(cache_key)
16
17 if cached_result:
18 return cached_result
19
20 result = super().chat_completion(messages, **kwargs)
21 self.cache.store(cache_key, result)
22
23 return result
24
25 def _generate_cache_key(self, messages, kwargs):
26 # Create a deterministic key from messages and relevant kwargs
27 key_components = [
28 self.model_name,
29 str(messages),
30 str({k: v for k, v in kwargs.items() if k in ["temperature", "max_tokens"]})
31 ]
32 return hash("".join(key_components))
33
34# Use the cached client
35client = CachedOllamaClient(model_name="mistral")

Custom Tool Categories

Organize tools into categories for more structured agents:

python
1from agents import Agent, function_tool
2from ollama_client import OllamaClient
3
4# Document tools
5@function_tool(category="document")
6def fetch_document(url: str) -> dict:
7 """Fetches a document from URL."""
8 # Implementation
9
10@function_tool(category="document")
11def analyze_document(text: str) -> dict:
12 """Analyzes document content."""
13 # Implementation
14
15# Search tools
16@function_tool(category="search")
17def web_search(query: str) -> dict:
18 """Searches the web for information."""
19 # Implementation
20
21@function_tool(category="search")
22def image_search(query: str) -> dict:
23 """Searches for images."""
24 # Implementation
25
26# Create an agent with tool categories
27client = OllamaClient(model_name="mixtral")
28
29agent = Agent(
30 name="ResearchAssistant",
31 instructions="""
32 You are a Research Assistant with access to different tool categories:
33
34 DOCUMENT TOOLS:
35 - Use fetch_document to retrieve document content
36 - Use analyze_document to extract insights from documents
37
38 SEARCH TOOLS:
39 - Use web_search to find information online
40 - Use image_search to find relevant images
41
42 Choose the appropriate tool category based on the user's request.
43 """,
44 tools=[fetch_document, analyze_document, web_search, image_search],
45 model=client,
46)

Custom Response Formatting

Implement custom response formatting for specialized outputs:

python
1class CustomAgentResponse:
2 def __init__(self, result):
3 self.message = self._format_message(result.final_output)
4 self.conversation_id = getattr(result, 'conversation_id', None)
5 self.tool_calls = self._extract_tool_calls(result)
6 self.formatted_output = self._generate_formatted_output()
7
8 def _format_message(self, message):
9 # Format the message (e.g., add markdown, structure sections)
10 return message
11
12 def _extract_tool_calls(self, result):
13 # Extract tool call information
14 tool_calls = []
15 # Extraction logic
16 return tool_calls
17
18 def _generate_formatted_output(self):
19 # Create a custom formatted output (e.g., HTML, JSON)
20 return {
21 "message": self.message,
22 "tools_used": [t.name for t in self.tool_calls],
23 "formatted_at": datetime.now().isoformat()
24 }
25
26 def to_json(self):
27 """Convert the response to JSON."""
28 return json.dumps(self.formatted_output)
29
30 def to_html(self):
31 """Convert the response to HTML."""
32 html = f"<div class='agent-response'><p>{self.message}</p>"
33 if self.tool_calls:
34 html += "<ul class='tools-used'>"
35 for tool in self.tool_calls:
36 html += f"<li>{tool.name}</li>"
37 html += "</ul>"
38 html += "</div>"
39 return html

Debugging and Troubleshooting

Enabling Debug Logging

python
1import logging
2import sys
3
4# Configure logging
5logging.basicConfig(
6 level=logging.DEBUG,
7 format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
8 handlers=[
9 logging.FileHandler("simulacra01.log"),
10 logging.StreamHandler(sys.stdout)
11 ]
12)
13
14# Create module-specific loggers
15ollama_logger = logging.getLogger("ollama_client")
16agent_logger = logging.getLogger("agent")
17tools_logger = logging.getLogger("tools")
18
19# Set specific log levels if needed
20ollama_logger.setLevel(logging.DEBUG)

Common Issues and Solutions

Model Not Found

Problem: Error: model 'xyz' not found

Solution:

  1. Check available models: ollama list
  2. Pull the missing model: ollama pull xyz
  3. Verify model spelling in your code

Memory Issues

Problem: Out of memory errors when running large models

Solution:

  1. Use a smaller model (mistral instead of mixtral)
  2. Reduce batch size or context length
  3. Add system swap space
  4. Close other memory-intensive applications

API Compatibility

Problem: OpenAI SDK methods not working with Ollama

Solution:

  1. Check the adapter implementation for the specific method
  2. Add wrapper methods to OllamaClient class
  3. Consult Ollama API documentation for endpoint limitations

Tool Call Issues

Problem: Model not using tools or using them incorrectly

Solution:

  1. Simplify tool definitions and make their purpose more explicit
  2. Check that tool schemas are properly defined
  3. Add more specific instructions about tool usage in the agent prompt
  4. Try a more capable model (llama3 or mixtral)

Inspecting Raw Responses

To inspect raw model responses:

python
1from ollama_client import OllamaClient
2
3client = OllamaClient(model_name="mistral")
4
5# Get raw response
6response = client.chat.completions.create(
7 model="mistral",
8 messages=[{"role": "user", "content": "What is 2+2?"}],
9 temperature=0.7,
10)
11
12# Print full response object
13import json
14print(json.dumps(response.model_dump(), indent=2))

Performance Optimization

Model Selection Strategies

Choose the right model for the task:

| Task Type | Recommended Model | Notes | |-----------|-------------------|-------| | Simple Q&A | gemma | Fastest, lower resource usage | | General assistant | mistral | Good balance of quality/speed | | Complex reasoning | llama3 | Higher quality, more resources | | Specialized domains | mixtral | Multi-specialty, highest resources |

Prompt Optimization

Optimize prompts for better efficiency:

  1. Be Specific: Clear, concise instructions reduce token usage
  2. Provide Examples: Few-shot examples improve response quality
  3. Structured Output: Request specific formats to reduce parsing needs
  4. Limit Context: Only include relevant information
  5. Use Separators: Clearly delineate sections with markers

Chunking Strategies

For large documents, implement effective chunking:

python
1def chunk_document(text, chunk_size=2000, overlap=200):
2 """Split document into overlapping chunks."""
3 chunks = []
4
5 # Simple character-based chunking
6 for i in range(0, len(text), chunk_size - overlap):
7 chunk = text[i:i + chunk_size]
8 chunks.append(chunk)
9
10 return chunks
11
12def chunk_document_by_section(text, chunk_size=2000):
13 """Split document by natural section boundaries."""
14 # Find section boundaries (e.g., headers, paragraph breaks)
15 sections = re.split(r'(?:\n\s*){2,}|(?:#{1,6}\s+[^\n]+\n)', text)
16
17 chunks = []
18 current_chunk = ""
19
20 for section in sections:
21 # If adding this section would exceed chunk size, save current chunk
22 if len(current_chunk) + len(section) > chunk_size and current_chunk:
23 chunks.append(current_chunk)
24 current_chunk = section
25 else:
26 current_chunk += section
27
28 # Add the final chunk if not empty
29 if current_chunk:
30 chunks.append(current_chunk)
31
32 return chunks

Caching Optimization

Implement multi-level caching for better performance:

python
1class MultiLevelCache:
2 def __init__(self):
3 self.memory_cache = {} # Fast, in-memory cache
4 self.disk_cache = DiskCache("./cache") # Persistent disk cache
5
6 def get(self, key):
7 # First check memory cache (fastest)
8 if key in self.memory_cache:
9 return self.memory_cache[key]
10
11 # Then check disk cache
12 disk_result = self.disk_cache.get(key)
13 if disk_result:
14 # Also store in memory for future fast access
15 self.memory_cache[key] = disk_result
16 return disk_result
17
18 return None
19
20 def store(self, key, value):
21 # Store in both caches
22 self.memory_cache[key] = value
23 self.disk_cache.store(key, value)
24
25 def clear_memory_cache(self):
26 """Clear only memory cache (e.g., to free memory)."""
27 self.memory_cache = {}

Parallel Processing

Implement parallel processing for multiple operations:

python
1import concurrent.futures
2
3def process_document_parallel(document, queries):
4 """Process multiple queries against a document in parallel."""
5 results = {}
6
7 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
8 # Create a future for each query
9 future_to_query = {
10 executor.submit(search_document, document, query): query
11 for query in queries
12 }
13
14 # Process results as they complete
15 for future in concurrent.futures.as_completed(future_to_query):
16 query = future_to_query[future]
17 try:
18 results[query] = future.result()
19 except Exception as e:
20 results[query] = {"error": str(e)}
21
22 return results

Contributing and Development

Setting Up Development Environment

bash
1# Clone the repository
2git clone https://github.com/yourusername/simulacra01.git
3cd simulacra01
4
5# Create a virtual environment
6python -m venv venv
7source venv/bin/activate # On Windows: venv\Scripts\activate
8
9# Install in development mode with dev dependencies
10pip install -e ".[dev]"
11
12# Install pre-commit hooks
13pre-commit install

Running Tests

bash
1# Run all tests
2pytest
3
4# Run specific test file
5pytest tests/test_ollama_client.py
6
7# Run with coverage
8coverage run -m pytest
9coverage report
10coverage html # Generate HTML report

Code Style and Linting

bash
1# Check code style
2ruff check .
3
4# Auto-fix issues
5ruff check --fix .
6
7# Run type checking
8mypy .

Documentation Generation

bash
1# Generate API documentation
2python scripts/generate_docs.py
3
4# Build documentation website
5mkdocs build
6
7# Serve documentation locally
8mkdocs serve

Branch Strategy

  • main: Stable release branch
  • develop: Main development branch
  • feature/*: For new features
  • bugfix/*: For bug fixes
  • release/*: For release preparation

Commit Guidelines

Use conventional commits format:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Code style changes
  • refactor: Code refactoring
  • test: Adding or updating tests
  • chore: Maintenance tasks

Pull Request Process

  1. Create a new branch from develop
  2. Make your changes
  3. Run tests and linting
  4. Submit a pull request to develop
  5. Ensure CI passes
  6. Request review from maintainers
  7. Address review feedback

This comprehensive guide covers all aspects of using, customizing, and extending Simulacra01. For additional information, refer to the specific documentation files in the docs/ directory.

Sovereign AI book cover

Sovereign AI: Building Local-First Intelligent Systems

by Daniel Kliewer · Paperback · 72 pages

The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.