Building a Knowledge-Sharing Chatbot: Turn Expertise Into an AI That Anyone Can Query

We all carry knowledge that others need. Whether you're a seasoned manager with institutional history, a technician with troubleshooting tricks learned over decades, or a founder with lessons from a hundred decisions—the problem is the same: your knowledge is trapped in your head, and it scales poorly.

You could write documentation, but documentation is static. It doesn't answer follow-up questions. It doesn't adapt to what someone actually needs in the moment. And most people don't read it anyway—they ask you.

What if you could clone the part of yourself that answers questions? Not a generic AI, but one trained on your knowledge, your processes, your edge cases?

That's what I built: a system that captures expertise through guided interviews, transforms it into structured documentation, and delivers a chatbot that anyone can query. The key constraint? Everything runs locally—no cloud APIs, no monthly fees, complete privacy.

The Real Problem: Knowledge Bottlenecks

Every expert becomes a bottleneck. Here's how it manifests:

For individuals:

You answer the same questions repeatedly
Your time gets consumed by knowledge transfer instead of high-value work
When you're unavailable, decisions wait or go wrong

For organizations:

Key person dependency creates risk
Onboarding takes months instead of weeks
Hard-won lessons get lost when people leave

For communities:

Expertise remains siloed with a few individuals
Newcomers struggle to get up to speed
Knowledge fragments across chat logs, emails, and documents

Traditional solutions don't work well. Wikis go stale. Training videos are passive. Documentation requires people to know what to look for. What people actually want is conversation—the ability to ask questions and get answers tailored to their context.

The Solution: A Knowledge-Capture-to-Chatbot Pipeline

The system I built follows a simple but powerful pipeline:

Expert Interview → LLM Structuring → Vector Embeddings → Queryable Chatbot
                                          ↓
                               Unknown Questions → Expert Review
                                          ↓
                               New Knowledge Integrated ←

This creates a learning loop: the chatbot answers what it knows, flags what it doesn't, and gets smarter over time.

Why This Approach Works

Interview-based capture: Experts don't have to write documentation—they just answer questions they already know
LLM structuring: Raw responses get transformed into organized, readable documentation automatically
Semantic search: Users ask questions naturally, not with exact keywords
Dynamic learning: The system improves without manual updates

The Architecture in Practice

Let me show you how each component works, using real code from the implementation.

Phase 1: Capturing Expert Knowledge

The first challenge is getting knowledge out of people's heads. Most experts are too busy to write comprehensive documentation, but they'll answer focused questions.

The interview module uses a structured approach:

Python
# Structured interview questions for staff
INTERVIEW_QUESTIONS = [
    "What are the main tasks you do daily?",
    "What mistakes do new hires often make?",
    "Which documents or forms are essential for your role?",
    "Are there any edge cases you frequently encounter?",
    "What advice would you give to someone just starting in this role?"
]

# Keywords that indicate potential edge cases
EDGE_CASE_KEYWORDS = [
    "sometimes", "rarely", "depends", "if", "occasionally",
    "usually", "typically", "in rare cases", "edge case"
]


def detect_edge_cases(response_text: str) -> list:
    """
    Detect potential edge cases based on keywords in the response.
    """
    edge_cases = []
    sentences = response_text.split('. ')
    
    for sentence in sentences:
        sentence_lower = sentence.lower()
        for keyword in EDGE_CASE_KEYWORDS:
            if keyword in sentence_lower:
                edge_cases.append(sentence.strip())
                break
    
    return edge_cases

The interview process is deliberately conversational:

Python
def run_staff_interview(staff_name: str, role: str) -> StaffResponse:
    """
    Run an interactive staff interview via console input.
    """
    print(f"\n{'='*50}")
    print(f"Expert Interview: {staff_name} - {role}")
    print(f"{'='*50}\n")
    
    responses = []
    all_edge_cases = []
    
    for question in INTERVIEW_QUESTIONS:
        print(f"Question: {question}")
        answer = input("Answer: ").strip()
        
        if not answer:
            print("  (Skipped - no answer provided)")
            continue
            
        responses.append(f"Q: {question}\nA: {answer}")
        
        # Check for edge cases in the answer
        detected = detect_edge_cases(answer)
        all_edge_cases.extend(detected)
        
        print(f"  ✓ Recorded ({len(detected)} potential edge cases detected)\n")
    
    # Combine all responses into single text
    response_text = "\n\n".join(responses)
    
    # Create and save staff response
    session = get_session()
    staff_response = StaffResponse(
        staff_name=staff_name,
        role=role,
        response_text=response_text,
        edge_cases=all_edge_cases
    )
    session.add(staff_response)
    session.commit()
    
    print(f"\n{'='*50}")
    print(f"Interview complete! {len(all_edge_cases)} edge cases detected.")
    print(f"Responses saved for {staff_name} ({role})")
    print(f"{'='*50}\n")
    
    session.close()
    return staff_response

Key insight: The questions are designed to surface not just what to do, but what goes wrong. Questions about mistakes and edge cases capture the tacit knowledge that never makes it into formal documentation.

Phase 2: Structuring Knowledge with LLMs

Raw interview responses are valuable but unstructured. The LLM transforms them into coherent documentation:

Python
from db import StaffResponse, SOPDraft, get_session
from llm import generate_text, SOP_GENERATION_PROMPT


def generate_sop(staff_response_id: int) -> SOPDraft:
    """
    Generate an SOP draft from staff responses.
    """
    session = get_session()
    
    # Get staff response
    sr = session.query(StaffResponse).filter_by(id=staff_response_id).first()
    if not sr:
        print(f"Staff response with ID {staff_response_id} not found")
        session.close()
        return None
    
    print(f"Generating SOP for {sr.staff_name} ({sr.role})...")
    
    # Build prompt for SOP generation
    prompt = f"""Create a detailed Standard Operating Procedure (SOP) based on the following staff responses:

{sr.response_text}

Please organize this into a clear, professional SOP with:
1. Role Overview
2. Daily Tasks (step-by-step)
3. Common Mistakes to Avoid
4. Essential Forms/Documents
5. Edge Cases / Special Circumstances
"""
    
    # Generate SOP using Ollama
    sop_text = generate_text(
        prompt=prompt,
        system_prompt=SOP_GENERATION_PROMPT,
        temperature=0.3,
        max_tokens=1500
    )
    
    # Create SOP draft
    sop = SOPDraft(
        role=sr.role,
        sop_text=sop_text,
        metadata={
            "staff_name": sr.staff_name,
            "staff_response_id": sr.id
        }
    )
    
    session.add(sop)
    session.commit()
    
    print(f"SOP generated and saved for role: {sr.role}")
    
    session.close()
    return sop

The system prompt guides the LLM to create well-structured output:

Python
SOP_GENERATION_PROMPT = """You are an expert process engineer and technical writer. 
Your task is to create clear, structured Standard Operating Procedures (SOPs) from staff responses.
Create well-organized documents with:
- Clear step-by-step instructions
- Checklists where appropriate
- Common mistakes to avoid
- Essential forms/documents needed
- Any edge cases or special circumstances

Format the output professionally with clear headings."""

Key insight: Using a low temperature (0.3) for SOP generation keeps the output factual and grounded in the actual expert responses, rather than letting the LLM "creatively" embellish.

Phase 3: Making Knowledge Searchable

Structured documentation is great, but only if people can find what they need. This is where vector embeddings transform static text into queryable knowledge.

The embedding module handles chunking and storage:

Python
import chromadb
from chromadb.config import Settings
from db import SOPDraft, get_session
from llm import get_embedding


# Chroma client setup
CHROMA_PERSIST_DIR = "./chroma_db"
client = chromadb.PersistentClient(path=CHROMA_PERSIST_DIR)

# Get or create collection
collection = client.get_or_create_collection(
    name="sop_collection",
    metadata={"description": "SOP embeddings for onboarding chatbot"}
)


def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list:
    """
    Split text into chunks for embedding.
    """
    if len(text) <= chunk_size:
        return [text]
    
    chunks = []
    start = 0
    
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        
        # Try to break at sentence boundary
        if end < len(text):
            last_period = chunk.rfind('.')
            last_newline = chunk.rfind('\n')
            break_point = max(last_period, last_newline)
            if break_point > start:
                chunk = text[start:break_point + 1]
                start = break_point + 1
            else:
                start = end - overlap
        else:
            start = end
        
        chunks.append(chunk.strip())
    
    return chunks

Key insight: The chunking algorithm tries to break at sentence boundaries. Arbitrary cuts in the middle of sentences create embeddings that lose semantic coherence—breaking at natural boundaries preserves meaning.

The embedding function itself is straightforward:

Python
def embed_sop(sop_id: int) -> bool:
    """
    Embed an SOP into the vector database.
    """
    session = get_session()
    
    sop = session.query(SOPDraft).filter_by(id=sop_id).first()
    if not sop:
        print(f"SOP with ID {sop_id} not found")
        session.close()
        return False
    
    print(f"Embedding SOP for role: {sop.role}")
    
    # Chunk the SOP text
    chunks = chunk_text(sop.sop_text)
    print(f"  Split into {len(chunks)} chunks")
    
    # Get existing IDs to avoid duplicates
    existing_ids = collection.get()["ids"]
    
    # Add each chunk to the collection
    for i, chunk in enumerate(chunks):
        chunk_id = f"{sop_id}_{i}"
        
        if chunk_id in existing_ids:
            continue
        
        embedding = get_embedding(chunk)
        if not embedding:
            print(f"  Warning: Failed to get embedding for chunk {i}")
            continue
        
        collection.add(
            documents=[chunk],
            metadatas=[{
                "sop_id": sop_id,
                "role": sop.role,
                "chunk_index": i,
                "source": "sop"
            }],
            ids=[chunk_id],
            embeddings=[embedding]
        )
    
    print(f"  Embedded {len(chunks)} chunks for role: {sop.role}")
    session.close()
    return True

Retrieval is where semantic search shines:

Python
def retrieve_sop_chunks(query: str, n_results: int = 3, role: str = None) -> dict:
    """
    Retrieve relevant SOP chunks for a query.
    """
    query_embedding = get_embedding(query)
    if not query_embedding:
        return {"documents": [], "metadatas": [], "distances": []}
    
    # Build where filter if role specified
    where = {"role": role} if role else None
    
    # Query collection
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
        where=where
    )
    
    return results

Key insight: The role parameter enables knowledge domain filtering. If someone is asking about front desk procedures, they shouldn't get maintenance procedures mixed in—even if the topics happen to share keywords.

Phase 4: The Chatbot Interface

All of this infrastructure culminates in the chatbot—the interface where people actually interact with your knowledge.

Python
import os
import logging
from telegram import Update
from telegram.ext import (
    Application,
    CommandHandler,
    MessageHandler,
    filters,
    ContextTypes
)

from llm import generate_text, CHATBOT_PROMPT
from embedding import retrieve_sop_chunks
from edge_cases import detect_edge_case, get_simulated_staff_answer

# Configure logging
logging.basicConfig(
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    level=logging.INFO
)
logger = logging.getLogger(__name__)

# Bot configuration
TELEGRAM_BOT_TOKEN = os.environ.get("TELEGRAM_BOT_TOKEN", "")
DEFAULT_ROLE = "Front Desk"


async def start_command(update: Update, context: ContextTypes.DEFAULT_TYPE):
    """Handle /start command."""
    await update.message.reply_text(
        "Welcome to the Knowledge Assistant! 🤖\n\n"
        "I'm here to help you learn about procedures and best practices. "
        "You can ask me questions about:\n"
        "- Daily tasks and procedures\n"
        "- Forms and documents\n"
        "- Common mistakes to avoid\n"
        "- Edge cases and special situations\n\n"
        "Just type your question and I'll do my best to help!"
    )


async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    """Handle incoming messages."""
    user_message = update.message.text
    user_id = update.effective_user.id
    
    logger.info(f"Message from {user_id}: {user_message}")
    
    # Get user's role context
    role = context.user_data.get("role", DEFAULT_ROLE)
    
    # Check for edge cases first
    is_edge_case, confidence, chunks = detect_edge_case(user_message, role)
    
    if is_edge_case:
        # Handle as edge case
        await update.message.reply_text(
            "🤔 This is an interesting question that I don't have complete information about. "
            "I've flagged it for expert review."
        )
        
        # Get simulated expert answer for demo
        expert_answer = get_simulated_staff_answer(user_message, role)
        await update.message.reply_text(
            f"💡 Here's what an expert says: {expert_answer}"
        )
        
        logger.info(f"Edge case handled for user {user_id}")
    else:
        # Normal retrieval + generation
        if chunks:
            context_text = "\n\n".join(chunks)
            
            prompt = f"""Based on the following information, answer the user's question:

Reference Information:
{context_text}

User Question: {user_message}

Provide a clear, helpful answer. If the information doesn't fully answer the question, acknowledge that and provide what's available."""
            
            response = generate_text(
                prompt=prompt,
                system_prompt=CHATBOT_PROMPT,
                temperature=0.5,
                max_tokens=500
            )
        else:
            response = (
                "I don't have specific information about that yet. "
                "Would you like me to flag this question for an expert to review?"
            )
        
        await update.message.reply_text(response)
        logger.info(f"Response sent to user {user_id}")

Key insight: The chatbot uses a higher temperature (0.5) for responses than for SOP generation (0.3). This allows for slightly more conversational responses while still being grounded in the source material.

Phase 5: The Learning Loop

The most powerful feature is the system's ability to learn from questions it can't answer:

Python
# Similarity threshold for edge case detection
EDGE_CASE_THRESHOLD = 0.7


def detect_edge_case(user_question: str, role: str = None) -> tuple:
    """
    Detect if a question is an edge case (not well answered by existing knowledge).
    """
    from embedding import retrieve_sop_chunks
    
    results = retrieve_sop_chunks(user_question, n_results=3, role=role)
    
    if not results or not results.get("documents"):
        return True, 0.0, []
    
    distances = results.get("distances", [[]])[0]
    
    if not distances:
        return True, 0.0, []
    
    best_distance = min(distances)
    confidence = 1.0 - best_distance
    
    is_edge_case = confidence < EDGE_CASE_THRESHOLD
    
    return is_edge_case, confidence, results.get("documents", [])

When an edge case is identified, expert answers get integrated back:

Python
def integrate_edge_case_answer(
    user_question: str,
    expert_answer: str,
    role: str
) -> bool:
    """
    Integrate an expert answer into the knowledge base.
    
    This is the "learning" part of the system - new knowledge gets
    added and becomes available for future queries.
    """
    session = get_session()
    
    # Find existing SOP for this role
    sop = session.query(SOPDraft).filter_by(role=role).first()
    
    if sop:
        # Append new Q&A to existing SOP
        edge_case_text = f"\n\n---\n\nQ: {user_question}\nA: {expert_answer}"
        sop.sop_text += edge_case_text
        session.commit()
        print(f"Updated SOP for role {role} with new knowledge")
    else:
        # Create new SOP if none exists
        sop = SOPDraft(
            role=role,
            sop_text=f"Q: {user_question}\nA: {expert_answer}",
            metadata={"source": "edge_case_integration"}
        )
        session.add(sop)
        session.commit()
        print(f"Created new knowledge base for role {role}")
    
    session.close()
    
    # Add to vector database
    existing = collection.get()
    new_chunk_id = f"edge_case_{len(existing['ids'])}"
    
    combined_text = f"Q: {user_question}\nA: {expert_answer}"
    embedding = get_embedding(combined_text)
    
    if embedding:
        collection.add(
            documents=[combined_text],
            metadatas=[{
                "role": role,
                "source": "edge_case",
                "question": user_question
            }],
            ids=[new_chunk_id],
            embeddings=[embedding]
        )
        print(f"Added knowledge to vector database")
        return True
    
    return False

Key insight: This creates a self-improving system. Every question that stumps the chatbot becomes a training opportunity. Over time, the system converges toward comprehensive coverage of what people actually ask about.

The Complete Data Flow

Here's the full lifecycle from interview to query:

1. Knowledge Capture

Expert answers guided questions
→ System detects potential edge cases automatically
→ Responses stored in database

2. Knowledge Structuring

Raw responses fed to LLM
→ Structured documentation generated
→ Human can review and edit

3. Knowledge Embedding

Documentation split into chunks
→ Each chunk embedded using local model
→ Stored in Chroma vector database

4. Query Time

User asks question naturally
→ Question embedded
→ Semantic search finds relevant chunks
→ LLM generates answer grounded in context

5. Gap Filling

Low confidence answer detected
→ Question flagged for expert
→ Expert provides answer
→ Answer integrated into knowledge base
→ Future queries benefit

Why Local-First Matters

This system runs entirely on your machine. Here's why that matters:

Privacy: Your knowledge base might contain proprietary processes, institutional details, or sensitive information. With local LLMs, nothing leaves your infrastructure.

Cost: No per-token API charges. No monthly subscription. No surprise bills when your chatbot gets popular.

Control: You choose which models to use. You can fine-tune on your data. You own everything.

Reliability: No API outages. No rate limits. No dependency on external services.

The LLM wrapper makes this simple:

Python
import requests
import os
from typing import Optional

OLLAMA_BASE_URL = os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434")
DEFAULT_GENERATION_MODEL = os.environ.get("OLLAMA_MODEL", "llama3.2")
DEFAULT_EMBEDDING_MODEL = os.environ.get("OLLAMA_EMBED_MODEL", "nomic-embed-text")


def generate_text(
    prompt: str,
    model: str = DEFAULT_GENERATION_MODEL,
    system_prompt: Optional[str] = None,
    temperature: float = 0.7,
    max_tokens: int = 1000
) -> str:
    """Generate text using local Ollama."""
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": False,
        "options": {
            "temperature": temperature,
            "num_predict": max_tokens
        }
    }
    
    if system_prompt:
        payload["system"] = system_prompt
    
    try:
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/generate",
            json=payload,
            timeout=120
        )
        response.raise_for_status()
        return response.json().get("response", "")
    except requests.exceptions.ConnectionError:
        return "Error: Cannot connect to Ollama. Make sure 'ollama serve' is running."


def get_embedding(text: str, model: str = DEFAULT_EMBEDDING_MODEL) -> list:
    """Get text embedding using local Ollama."""
    try:
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/embeddings",
            json={"model": model, "prompt": text},
            timeout=30
        )
        response.raise_for_status()
        return response.json().get("embedding", [])
    except requests.exceptions.ConnectionError:
        print("Error: Cannot connect to Ollama. Make sure 'ollama serve' is running.")
        return []

Use Cases Beyond Onboarding

While I built this for knowledge transfer, the pattern applies broadly:

Consultants: Package your methodology into a chatbot clients can query between sessions

Founders: Capture decision rationale so new team members understand not just what, but why

Researchers: Create a queryable knowledge base from your notes and papers

Support Teams: Turn ticket history into a chatbot that answers customer questions

Craftsmen: Document techniques and troubleshooting so apprentices can learn independently

Community Leaders: Capture institutional knowledge so it survives leadership transitions

Key Learnings

Building this system taught me:

1. Capture Must Be Frictionless

Experts won't write documentation, but they'll answer questions. The interview format is familiar and low-effort.

2. LLMs Are Great Structurers

The transformation from raw responses to organized documentation is where LLMs genuinely shine. They handle the tedious work of formatting and organizing.

3. Chunking Strategy Makes or Breaks Retrieval

Bad chunking = bad search. Breaking at sentence boundaries and including overlap preserves semantic meaning.

4. Edge Cases Are Features, Not Bugs

Every question the system can't answer is an opportunity to improve it. The learning loop is the most valuable part of the architecture.

5. Local-First Is Now Practical

Ollama has matured to the point where local LLMs are genuinely useful for production workloads. The quality is good enough, and the trade-offs (privacy, cost, control) often favor local.

6. Temperature Settings Matter

Different tasks need different creativity levels. SOP generation needs consistency (0.3), conversational responses need warmth (0.5), creative tasks might want more (0.7+).

What's Next

This MVP proves the concept. The production version could add:

Web Interface: Replace Telegram with a web chat for broader accessibility
Multi-Expert Knowledge Bases: Let multiple experts contribute to the same role
Knowledge Graphs: Visualize how concepts connect across different domains
Version History: Track how knowledge evolves over time
Export Formats: Generate PDFs, wikis, or training videos from the knowledge base
Confidence Calibration: Help users understand when to trust answers

Conclusion

The knowledge-sharing chatbot addresses a fundamental problem: expertise doesn't scale, but it can be captured.

The architecture is deliberately simple: interview → structure → embed → query → learn. Each step uses mature, well-understood technologies. What makes it work is the closed loop—the system gets smarter with every question it can't answer.

For anyone sitting on valuable knowledge that others need, this pattern offers a path forward. Not a generic AI, but one that knows what you know, answers how you would answer, and improves as it goes.

The future of knowledge transfer isn't better documentation—it's conversation.

Knowledge Architecture