·3 min

Mastering Open Deep Research: Complete Smolagents Setup Guide with GAIA Benchmark Performance and Production-Ready Agent Workflows

Comprehensive tutorial for setting up and optimizing Open Deep Research with Smolagents framework, featuring GAIA benchmark testing, multi-model support, and production-grade autonomous research agents.

DK

Daniel Kliewer

Author, Sovereign AI

SmolagentsOpen-Deep-ResearchHugging FaceGAIA BenchmarkAI AgentsCodeAgentWeb SearchTool IntegrationLLMAutonomous AI
Sovereign AI book cover

From the Book

This is from Sovereign AI: Building Local-First Intelligent Systems.

Get the Book — $88
Mastering Open Deep Research: Complete Smolagents Setup Guide with GAIA Benchmark Performance and Production-Ready Agent Workflows

Image

Step-by-Step Guide to Running Open Deep Research with smolagents

This guide walks you through setting up and using the Open Deep Research agent framework, inspired by OpenAI's Deep Research, leveraging Hugging Face's smolagents library. Follow these steps to reproduce agentic workflows for complex tasks like the GAIA benchmark.


Prerequisites

  • Python 3.8+ installed
  • Git installed
  • Hugging Face Account (optional for some model access)
  • Basic familiarity with CLI tools

Step 1: Set Up a Virtual Environment

Create an isolated Python environment to avoid dependency conflicts:

bash
1python3 -m venv venv # Create virtual environment
2source venv/bin/activate # Activate it (Linux/macOS)
3# For Windows: venv\Scripts\activate

Step 2: Install Dependencies

  1. Upgrade Pip:

    bash
    1pip install --upgrade pip
  2. Clone the Repository:

    bash
    1git clone https://github.com/huggingface/smolagents.git
    2cd smolagents/examples/open_deep_research
  3. Install Requirements:

    bash
    1pip install -r requirements.txt

Step 3: Configure the Agent

Key Components:

  • Model: Use Qwen/Qwen2.5-Coder-32B-Instruct (default) or choose from supported models.
  • Tools: Built-in tools include web_search, translation, and file/text inspection.
  • Imports: Add Python libraries (e.g., pandas, numpy) for code-based agent actions.

Step 4: Run the Agent via CLI

Use the smolagent command to execute tasks:

bash
1smolagent "{PROMPT}" \
2 --model-type "HfApiModel" \
3 --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" \
4 --imports "pandas numpy" \
5 --tools "web_search translation"

Example: GAIA-Style Task

bash
1smolagent "Which fruits in the 2008 painting 'Embroidery from Uzbekistan' were served on the October 1949 breakfast menu of the ocean liner later used in 'The Last Voyage'? List them clockwise from 12 o'clock." \
2 --tools "web_search text_inspector"

Model Options

Customize the LLM backend:

| Model Type | Example Command | |--------------------|---------------------------------------------------------------------------------| | Hugging Face API | --model-type "HfApiModel" --model-id "deepseek-ai/DeepSeek-R1" | | LiteLLM (100+ LLMs)| --model-type "LiteLLMModel" --model-id "anthropic/claude-3-5-sonnet-latest" | | Local Transformers | --model-type "TransformersModel" --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" |


Advanced Usage

1. Vision-Enabled Web Browser

For tasks requiring visual analysis (e.g., image-based GAIA questions):

bash
1webagent "Analyze the product images on example.com/sale and list prices" \
2 --model "LiteLLMModel" \
3 --model-id "gpt-4o"

2. Sandboxed Execution

Run untrusted code safely using E2B:

bash
1smolagent "{PROMPT}" --sandbox

3. Custom Tools

Add tools from LangChain/Hugging Face Spaces:

python
1# In your Python script
2from smolagents import Tool
3custom_tool = Tool.from_hub("username/my-custom-tool")

Troubleshooting

| Issue | Solution | |--------------------------------|-------------------------------------------| | ModuleNotFoundError | Ensure virtual env is activated | | API Key Errors | Set HF_TOKEN/ANTHROPIC_API_KEY env vars | | Tool Execution Failures | Check tool dependencies in requirements.txt |


Performance Notes

  • Code vs. JSON Agents: Code-based agents achieve ~55% accuracy on GAIA validation set vs. 33% for JSON-based (source).
  • Speed: Typical response time ~2-5 minutes for complex tasks (varies by model).

Community Contributions

To improve this project:

  1. Enhance Tools: Add PDF/Excel support to text_inspector.
  2. Optimize Browser: Implement vision-guided navigation.
  3. Benchmark: Submit results to GAIA Leaderboard.

By following this guide, you’ve replicated key components of OpenAI’s Deep Research using open-source tools. For updates, star the smolagents repo and join the Hugging Face community! 🚀

Sovereign AI book cover

Sovereign AI: Building Local-First Intelligent Systems

by Daniel Kliewer · Paperback · 72 pages

The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.