Complete Guide: Integrating OpenAI Agents SDK with Ollama for Local AI Agent Development

Complete Guide: Integrating OpenAI Agents SDK with Ollama
This comprehensive guide demonstrates how to integrate the official OpenAI Agents SDK with Ollama to create AI agents that run entirely on local infrastructure. By the end, you'll understand both the theoretical foundations and practical implementation of locally-hosted AI agents.
Table of Contents
- Introduction
- Understanding the Components
- Setting Up Your Environment
- Integrating Ollama with OpenAI Agents SDK
- Building a Document Analysis Agent
- Adding Document Memory
- Putting It All Together
- Troubleshooting
- Conclusion
Introduction
The OpenAI Agents SDK is a powerful framework for building agent-based AI systems that can solve complex tasks through planning and tool use. By integrating it with Ollama, we can run these agents locally, improving privacy, reducing latency, and eliminating API costs.
Understanding the Components
What is the OpenAI Agents SDK?
The OpenAI Agents SDK (agents) is a framework that simplifies the development of AI agents. It provides:
- A structured approach for defining agent behaviors
- Built-in support for tool usage and planning
- Session management for multi-turn conversations
- Memory and state persistence
At its core, this SDK formalizes the agent pattern that emerged from the broader LLM community, giving developers a standard way to implement agents that can plan, reason, and execute complex tasks.
What is Ollama?
Ollama is an open-source framework for running large language models (LLMs) locally. Key features include:
- Easy installation and model management
- Compatible API endpoints that mimic OpenAI's API structure
- Support for many open-source models (Llama, Mistral, etc.)
- Custom model creation and fine-tuning
Why Integrate Them?
Integration provides several benefits:
- Data Privacy: All data stays on your local machine
- Cost Efficiency: No pay-per-token API costs
- Customization: Fine-tune models for specific use cases
- Network Independence: Agents function without internet access
- Reduced Latency: Eliminate network roundtrips
Setting Up Your Environment
Step 1: Install Ollama
First, install Ollama following the instructions for your operating system:
For macOS and Linux:
Bashcurl -fsSL https://ollama.ai/install.sh | sh
For Windows:
Download the installer from Ollama's website.
Step 2: Download a Model
Pull a capable model that will power your agent. For this guide, we'll use Mistral:
Bashollama pull mistral
Verify that Ollama is working by running:
Bashollama run mistral "Hello, are you running correctly?"
You should see a response generated by the model.
Step 3: Install the OpenAI Agents SDK
Clone the repository and install the package:
Bashgit clone https://github.com/openai/openai-agents-python.git cd openai-agents-python pip install -e .
This installs the package in development mode, allowing you to modify the code if needed.
Step 4: Set Up Required Dependencies
Install additional dependencies:
Bashpip install requests python-dotenv pydantic
Integrating Ollama with OpenAI Agents SDK
The OpenAI Agents SDK uses the OpenAI Python client underneath. We need to create a custom client that directs requests to Ollama instead of OpenAI's servers.
Step 1: Create a Custom Client
Create a file named ollama_client.py:
Pythonimport os from openai import OpenAI class OllamaClient(OpenAI): """Custom OpenAI client that routes requests to Ollama.""" def __init__(self, model_name="mistral", **kwargs): # Configure to use Ollama's endpoint kwargs["base_url"] = "http://localhost:11434/v1" # Ollama doesn't require an API key but the client expects one kwargs["api_key"] = "ollama-placeholder-key" super().__init__(**kwargs) self.model_name = model_name # Check if the model exists print(f"Using Ollama model: {model_name}") def create_completion(self, *args, **kwargs): # Override model name if not explicitly provided if "model" not in kwargs: kwargs["model"] = self.model_name return super().create_completion(*args, **kwargs) def create_chat_completion(self, *args, **kwargs): # Override model name if not explicitly provided if "model" not in kwargs: kwargs["model"] = self.model_name return super().create_chat_completion(*args, **kwargs) # These methods are needed for compatibility with agents library def completion(self, prompt, **kwargs): if "model" not in kwargs: kwargs["model"] = self.model_name return self.completions.create(prompt=prompt, **kwargs) def chat_completion(self, messages, **kwargs): if "model" not in kwargs: kwargs["model"] = self.model_name return self.chat.completions.create(messages=messages, **kwargs)
Step 2: Create an Adapter for OpenAI Agents SDK
Now we'll create an adapter that makes the OpenAI Agents SDK compatible with our Ollama client. Create a file named agent_adapter.py:
Pythonfrom ollama_client import OllamaClient from openai.types.chat import ChatCompletion, ChatCompletionMessage import agents.agent as agent_module from agents.agent import Agent from agents.run import Runner, RunConfig from agents.models import _openai_shared import json import logging # Configure logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) # Set placeholder OpenAI API key to avoid initialization errors _openai_shared.set_default_openai_key("placeholder-key") # Store original init for Agent class original_init = Agent.__init__ def patched_init(self, *args, **kwargs): """Replace the model with OllamaClient if not provided.""" if "model" not in kwargs: kwargs["model"] = OllamaClient(model_name="mistral") original_init(self, *args, **kwargs) # Apply the patched init Agent.__init__ = patched_init # Class for a structured tool call class ToolCall: def __init__(self, name, inputs=None): self.name = name self.inputs = inputs or {} # Define a response class that matches what main.py expects class AgentResponse: def __init__(self, result): # Extract the message from the final output if hasattr(result, 'final_output'): if isinstance(result.final_output, str): self.message = result.final_output else: self.message = str(result.final_output) else: self.message = "I'm sorry, I couldn't process that request." # Get conversation ID if available self.conversation_id = getattr(result, 'conversation_id', None) # Initialize tool_calls self.tool_calls = [] # Extract tool calls from raw_responses if hasattr(result, 'raw_responses'): for response in result.raw_responses: try: if hasattr(response, 'output') and hasattr(response.output, 'tool_calls'): for tool_call in response.output.tool_calls: # Handle the case where tool_call is a dict if isinstance(tool_call, dict): name = tool_call.get('name', 'unknown_tool') inputs = tool_call.get('inputs', {}) self.tool_calls.append(ToolCall(name, inputs)) else: # Assume it's already an object with name and inputs attributes self.tool_calls.append(tool_call) except Exception as e: logger.error(f"Error extracting tool calls: {str(e)}") # Add a run method to the Agent class def run(self, message, conversation_id=None): """Run the agent with the given message. Args: message: The user message to process conversation_id: Optional conversation ID for continuity Returns: A response object with message, conversation_id, and tool_calls attributes """ try: # Create a direct prompt for the model prompt = f""" {self.instructions} User query: {message} """ # Get a response directly from the model (OllamaClient) response = self.model.chat.completions.create( model="mistral", messages=[{"role": "user", "content": prompt}], temperature=0.7, ) # Extract the text response response_text = response.choices[0].message.content # Create a minimal result object with just the response text class MinimalResult: def __init__(self, text, conv_id): self.final_output = text self.conversation_id = conv_id self.raw_responses = [] result = MinimalResult(response_text, conversation_id) # Return a response object return AgentResponse(result) except Exception as e: import traceback error_traceback = traceback.format_exc() logger.error(f"Error running agent: {str(e)}\n{error_traceback}") # Create a basic response with the error message response = AgentResponse(None) response.message = f"An error occurred: {str(e)}" return response # Make sure the run method is applied to the Agent class Agent.run = run # Debugging statement - log when the adapter is loaded print("Agent adapter loaded, Agent class patched with run method.")
Building a Document Analysis Agent
Let's build a practical agent that analyzes documents, extracts key information, and answers questions about the content.
Step 1: Create Document Memory
First, let's create a simple document memory system to store and retrieve analyzed documents. Create a file named document_memory.py:
Pythonimport os import json import hashlib from typing import Dict, List, Optional class DocumentMemory: """Simple document storage system for the agent.""" def __init__(self, storage_dir: str = "./document_memory"): self.storage_dir = storage_dir os.makedirs(storage_dir, exist_ok=True) self.index_file = os.path.join(storage_dir, "index.json") self.document_index = self._load_index() def _load_index(self) -> Dict: """Load document index from disk.""" if os.path.exists(self.index_file): with open(self.index_file, 'r') as f: return json.load(f) return {"documents": {}} def _save_index(self): """Save document index to disk.""" with open(self.index_file, 'w') as f: json.dump(self.document_index, f, indent=2) def _generate_doc_id(self, url: str) -> str: """Generate a unique ID for a document based on its URL.""" return hashlib.md5(url.encode()).hexdigest() def store_document(self, url: str, content: str, metadata: Optional[Dict] = None) -> str: """Store a document and return its ID.""" doc_id = self._generate_doc_id(url) doc_path = os.path.join(self.storage_dir, f"{doc_id}.txt") # Store document content with open(doc_path, 'w') as f: f.write(content) # Update index self.document_index["documents"][doc_id] = { "url": url, "path": doc_path, "metadata": metadata or {} } self._save_index() return doc_id def get_document(self, doc_id: str) -> Optional[Dict]: """Retrieve a document by ID.""" if doc_id not in self.document_index["documents"]: return None doc_info = self.document_index["documents"][doc_id] try: with open(doc_info["path"], 'r') as f: content = f.read() return { "id": doc_id, "url": doc_info["url"], "content": content, "metadata": doc_info["metadata"] } except Exception as e: print(f"Error retrieving document {doc_id}: {e}") return None def get_document_by_url(self, url: str) -> Optional[Dict]: """Find and retrieve a document by URL.""" doc_id = self._generate_doc_id(url) return self.get_document(doc_id) def list_documents(self) -> List[Dict]: """List all stored documents.""" return [ {"id": doc_id, "url": info["url"], "metadata": info["metadata"]} for doc_id, info in self.document_index["documents"].items() ]
Step 2: Define the Agent's Tools
Create a file named document_agent.py to implement the document analysis agent with its tools:
Pythonimport re import json import requests from datetime import datetime from typing import List, Dict, Any, Optional from pydantic import BaseModel, Field # Import the Agent directly from openai_agents from agents import Agent, function_tool from ollama_client import OllamaClient from document_memory import DocumentMemory # Import the agent adapter to add the run method to the Agent class import agent_adapter # Initialize document memory document_memory = DocumentMemory() # Define the tool schemas class FetchDocumentInput(BaseModel): url: str = Field(..., description="URL of the document to fetch") class FetchDocumentOutput(BaseModel): content: str = Field(..., description="Content of the document") class ExtractInfoInput(BaseModel): text: str = Field(..., description="Text to extract information from") info_type: str = Field( ..., description="Type of information to extract (e.g., 'dates', 'names', 'key points')" ) class ExtractInfoOutput(BaseModel): information: List[str] = Field(..., description="List of extracted information") class SearchDocumentInput(BaseModel): text: str = Field(..., description="Document text to search within") query: str = Field(..., description="Query to search for") class SearchDocumentOutput(BaseModel): results: List[str] = Field(..., description="List of matching paragraphs or sentences") # Implement tool functions @function_tool def fetch_document(url: str) -> Dict[str, Any]: """Fetches a document from a URL and returns its content. Checks document memory first before making a network request.""" # Check if document already exists in memory cached_doc = document_memory.get_document_by_url(url) if cached_doc: print(f"Retrieved document from memory: {url}") return {"content": cached_doc["content"]} # If not in memory, fetch from URL try: print(f"Fetching document from URL: {url}") response = requests.get(url) response.raise_for_status() content = re.sub(r"<[^>]+>", "", response.text) # Remove HTML tags # Store in document memory document_memory.store_document(url, content, {"fetched_at": str(datetime.now())}) return {"content": content} except Exception as e: return {"content": f"Error fetching document: {str(e)}"} @function_tool def extract_info(text: str, info_type: str) -> Dict[str, Any]: """Extracts specified type of information from text using Ollama.""" client = OllamaClient(model_name="mistral") prompt = f""" Extract all {info_type} from the following text. Return ONLY a JSON array with the items. TEXT: {text[:2000]} # Limit text length to prevent context overflow JSON ARRAY OF {info_type.upper()}: """ try: response = client.chat.completions.create( model="mistral", messages=[{"role": "user", "content": prompt}], temperature=0.1, # Lower temperature for more deterministic output ) result_text = response.choices[0].message.content print(f"Extract info response: {result_text[:100]}...") # Try to find JSON array in the response try: match = re.search(r"\[.*\]", result_text, re.DOTALL) if match: information = json.loads(match.group(0)) else: # If no JSON array is found, try to parse the entire response as JSON try: information = json.loads(result_text) if not isinstance(information, list): information = [result_text.strip()] except: information = [result_text.strip()] except json.JSONDecodeError: # Split by commas or newlines if JSON parsing fails information = [] for line in result_text.split('\n'): line = line.strip() if line and not line.startswith('```') and not line.endswith('```'): information.append(line) if not information: information = [item.strip() for item in result_text.split(",")] except Exception as e: print(f"Error in extract_info: {str(e)}") information = [f"Error extracting information: {str(e)}"] return {"information": information} @function_tool def search_document(text: str, query: str) -> Dict[str, Any]: """Searches for relevant content in the document.""" paragraphs = [p.strip() for p in re.split(r"\n\s*\n", text) if p.strip()] client = OllamaClient(model_name="mistral") prompt = f""" You need to find paragraphs in a document that answer or relate to the query: "{query}" Rate each paragraph's relevance to the query on a scale of 0-10. Return the 3 most relevant paragraphs with their ratings as JSON. Document sections: {json.dumps(paragraphs[:15])} # Limit to first 15 paragraphs for context limits Output format: [{"rating": 8, "text": "paragraph text"}, ...] """ try: response = client.chat.completions.create( model="mistral", messages=[{"role": "user", "content": prompt}], temperature=0.1, # Lower temperature for more deterministic output ) result_text = response.choices[0].message.content print(f"Search document response: {result_text[:100]}...") # Try to find JSON array in the response try: match = re.search(r"\[.*\]", result_text, re.DOTALL) if match: parsed = json.loads(match.group(0)) results = [item["text"] for item in parsed if "text" in item] else: # Try to parse the entire response as JSON try: parsed = json.loads(result_text) if isinstance(parsed, list): results = [item.get("text", str(item)) for item in parsed] else: results = [str(parsed)] except: # If JSON parsing fails, extract quoted text results = re.findall(r'"([^"]+)"', result_text) if not results: results = [result_text] except json.JSONDecodeError: # If JSON parsing fails completely results = [result_text] except Exception as e: print(f"Error in search_document: {str(e)}") results = [f"Error searching document: {str(e)}"] return {"results": results} # Define additional tools for document memory management class ListDocumentsOutput(BaseModel): documents: List[Dict] = Field(..., description="List of stored documents") class GetDocumentInput(BaseModel): url: str = Field(..., description="URL of the document to retrieve") class GetDocumentOutput(BaseModel): content: str = Field(..., description="Content of the retrieved document") metadata: Dict = Field(..., description="Metadata of the document") @function_tool def list_documents() -> Dict[str, Any]: """Lists all stored documents in memory.""" documents = document_memory.list_documents() return {"documents": documents} @function_tool def get_document(url: str) -> Dict[str, Any]: """Retrieves a document from memory by URL.""" doc = document_memory.get_document_by_url(url) if not doc: return {"content": "Document not found", "metadata": {}} return {"content": doc["content"], "metadata": doc["metadata"]} # Create a Document Analysis Agent def create_document_agent(): """Creates and returns an AI agent for document analysis.""" client = OllamaClient(model_name="mistral") # Collect all the tools decorated with function_tool tools = [ fetch_document, extract_info, search_document, list_documents, get_document ] agent = Agent( name="DocumentAnalysisAgent", instructions=( "You are a Document Analysis Assistant that helps users extract valuable information from documents.\n\n" "When given a task:\n" "1. If you need to analyze a document, first use fetch_document to get its content.\n" "2. Use extract_info to identify specific information in the document.\n" "3. Use search_document to find answers to specific questions.\n" "4. Summarize your findings in a clear, organized manner.\n\n" "You can manage documents with:\n" "- list_documents to see all stored documents\n" "- get_document to retrieve a previously fetched document\n\n" "Always be thorough and accurate in your analysis. If the document content is too large, " "focus on the most relevant sections for the user's query." ), tools=tools, model=client, ) return agent
Putting It All Together
Let's create a main.py file that will tie everything together and provide a command-line interface for interacting with our document analysis agent:
Pythonfrom document_agent import create_document_agent, document_memory from ollama_client import OllamaClient def print_banner(): """Print a welcome banner for the Document Analysis Agent.""" print("\n" + "="*60) print("π Document Analysis Agent π".center(60)) print("="*60) print("\nThis agent can analyze documents, extract information, and search for content.") print("It also has document memory to store and retrieve documents between sessions.") # Check for existing documents docs = document_memory.list_documents() if docs: print(f"\nποΈ {len(docs)} documents already in memory:") for i, doc in enumerate(docs, 1): print(f" {i}. {doc['url']}") print("\nCommands:") print(" 'exit' - Quit the program") print(" 'list' - Show stored documents") print(" 'help' - Show this help message") print("="*60 + "\n") def main(): print("Initializing Document Analysis Agent...") agent = create_document_agent() print_banner() # Debug: Test agent with a simple query try: print("\nDEBUG: Testing agent with 'what is war'") print("Processing...") test_response = agent.run(message="what is war") print(f"\nAgent (test): {test_response.message}") # If tools were used, show info about tool usage if test_response.tool_calls: print("\nπ οΈ Tools Used (test):") for tool in test_response.tool_calls: # Display more info about each tool call inputs = getattr(tool, 'inputs', {}) inputs_str = ', '.join(f"{k}='{v}'" for k, v in inputs.items()) if inputs else "" print(f" β’ {tool.name}({inputs_str})") except Exception as e: import traceback print(f"\nDEBUG ERROR: {str(e)}") traceback.print_exc() # Start a conversation session conversation_id = None while True: try: user_input = input("\nYou: ") if user_input.lower() == 'exit': break if user_input.lower() == 'help': print_banner() continue if user_input.lower() == 'list': docs = document_memory.list_documents() if not docs: print("\nNo documents in memory yet.") else: print(f"\nπ Documents in memory ({len(docs)}):") for i, doc in enumerate(docs, 1): metadata = doc.get('metadata', {}) fetched_at = metadata.get('fetched_at', 'unknown time') print(f" {i}. {doc['url']} (fetched: {fetched_at})") continue # Get agent response print("\nProcessing...") response = agent.run( message=user_input, conversation_id=conversation_id ) # Store the conversation ID for continuity conversation_id = response.conversation_id # Print the response print(f"\nAgent: {response.message}") # If tools were used, show info about tool usage if response.tool_calls: print("\nπ οΈ Tools Used:") for tool in response.tool_calls: # Display more info about each tool call inputs = getattr(tool, 'inputs', {}) inputs_str = ', '.join(f"{k}='{v}'" for k, v in inputs.items()) if inputs else "" print(f" β’ {tool.name}({inputs_str})") except KeyboardInterrupt: print("\nExiting...") break except Exception as e: import traceback print(f"\nERROR: {str(e)}") traceback.print_exc() if __name__ == "__main__": main()
Example Conversation
Here's an example of how a conversation with the document analysis agent might look:
==========================================
π Document Analysis Agent π
==========================================
This agent can analyze documents, extract information, and search for content.
It also has document memory to store and retrieve documents between sessions.
Commands:
'exit' - Quit the program
'list' - Show stored documents
'help' - Show this help message
==========================================
You: Please analyze the article at https://en.wikipedia.org/wiki/Artificial_intelligence and tell me when AI was first developed as a concept.
Processing...
Agent: Based on my analysis of the Wikipedia article on Artificial Intelligence, the concept of AI has historical roots that go back centuries, but the formal academic field of AI was founded in 1956.
According to the article, the workshop at Dartmouth College in the summer of 1956 is widely considered the founding event of artificial intelligence as a field. This was when John McCarthy coined the term "artificial intelligence" as the topic of the Dartmouth Conference.
However, the concept of artificial beings with intelligence appeared much earlier in history:
- Ancient Greek myths mentioned mechanical men designed to mimic human functions
- Philosophers like Aristotle described mechanical reasoning in syllogisms
- In the 13th century, Ramon Llull developed logical machines
- In the 17th century, RenΓ© Descartes, Thomas Hobbes, and Gottfried Leibniz explored mechanical reasoning
- In the 19th century, Charles Babbage and Ada Lovelace contemplated programmable mechanical calculating devices
The article indicates that the modern field of AI research officially began at that 1956 workshop organized by John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester.
π οΈ Tools Used:
β’ fetch_document(url='https://en.wikipedia.org/wiki/Artificial_intelligence')
β’ search_document(query='when was AI first developed concept history')
β’ extract_info(info_type='key dates in AI history')
Troubleshooting
Here are some common issues you might encounter and how to fix them:
1. Model Issues
Problem: The model generates poor responses, hallucinates, or fails to use tools properly.
Solution:
- Try a more capable model like
llama3ormixtral - Check if your prompts are clear and well-formatted
- Reduce the complexity of your tools
- Add more explicit instructions in the agent's system prompt
You can pull a more capable model with:
Bashollama pull llama3
Then update your client:
Pythonclient = OllamaClient(model_name="llama3")
2. Context Length Issues
Problem: The model returns incomplete responses or fails when processing long documents.
Solution:
- Implement chunking for document text (we've already limited to 2000 characters in our tools)
- Use models with larger context windows if available (like Llama 3 or Mixtral)
- Break down complex tasks into smaller subtasks
3. API Compatibility Issues
Problem: Some OpenAI client functions aren't supported by Ollama.
Solution:
- Our adapted client handles the most common method differences
- If you encounter unsupported features, add similar wrapper methods to OllamaClient class
- Check Ollama's API documentation for compatible endpoints
Conclusion
In this guide, we've explored how to integrate the OpenAI Agents SDK with Ollama to create a powerful document analysis agent that runs entirely on local infrastructure. This approach combines the best of both worlds: the structured agent framework from OpenAI with the privacy and cost benefits of local inference through Ollama.
Key takeaways:
-
Architecture: We've created a layered architecture with:
- Ollama providing the LLM inference capability
- A custom client adapter connecting Ollama to the OpenAI interface
- The OpenAI Agents SDK providing the agent framework
- Custom tools for document analysis and memory
-
Implementation: We've built a complete document analysis agent with:
- Document fetching and parsing
- Information extraction
- Document search
- Persistent document storage
-
Benefits:
- Complete data privacy
- No ongoing API costs
- Customizable to specific use cases
- Works offline
-
Limitations and Mitigations:
- Model quality limitations (mitigated by using more capable models)
- Context length constraints (mitigated with our chunking approach)
- API compatibility gaps (mitigated with our custom client)
This integration demonstrates how organizations can leverage the power of advanced AI agent frameworks while maintaining control over their data and infrastructure. The result is a flexible, extensible system that can be adapted to many different use cases beyond document analysis.
By building on this foundation, you can create specialized agents for various domains while keeping all processing local and secure.