Building an Advanced AI Image-to-Book Pipeline: Multimodal Storytelling with LLaVA, ChromaDB, and Recursive Narrative Generation Using Ollama
Complete technical guide to creating an AI-powered narrative generation system that transforms static images into complete books using multimodal analysis, vector databases, and recursive storytelling with LangChain and Ollama.
Daniel Kliewer
Author, Sovereign AI


Introduction: Building an AI-Powered Narrative Generation System
This guide presents a comprehensive technical framework for transforming static images into coherent, long-form narratives using modern AI tools. The system combines multimodal perception, recursive context management, and human-in-the-loop editing to create stories that maintain stylistic consistency while evolving organically from a visual seed.
Core Philosophy
The architecture embodies three fundamental principles:
- Visual Semantics as Foundation: Every narrative element derives from image analysis
- Contextual Memory: Recursive retrieval maintains story continuity
- Creative Control: Human oversight guides AI generation
Key Components
1. Multimodal Perception Engine
- Input: JPEG/PNG images (max 10MB)
- Processing:
- LLaVA (Local): Free OSS model via Ollama
- GPT-4V (Cloud): Commercial API alternative
- Output: Structured JSON schema validated with Pydantic:
python1class ImageAnalysis(BaseModel):2 setting: str # Primary environment description3 characters: list[str] # Living entities (named if detectable)4 mood: str # Emotional valence (0-1 scale)5 objects: list[str] # Significant inanimate items6 potential_conflicts: list[str] # Narrative tension sources
2. Context-Aware Generation System
- Vector Database: ChromaDB with cosine similarity search
- Chunking Strategy:
- 500-token segments with metadata:
json1{2 "chapter": 3,3 "active_characters": ["protagonist", "antagonist"],4 "location": "enchanted_forest",5 "mood_shift": 0.156} - Retrieval Logic: Hybrid semantic/keyword search
3. Recursive Narrative Engine
- Core Model: DeepSeek 70B via Ollama (4-bit quantized)
- Prompt Architecture:
python1def build_prompt(context):2 return f"""3 You are {context['author_style']} writing a new chapter.4 Current Status: {context['summary']}5 Required Elements: {context['required']}6 Forbidden Tropes: {context['banned']}7 """
- Validation Layer:
- Tone consistency checks
- Plot hole detection
- Character continuity verification
Workflow Overview
-
Image → Structured Data
- Multimodal model extracts 42 semantic features
- Validation ensures narrative viability
-
Initial Context Embedding
- Store analysis in ChromaDB with initial metadata
-
Recursive Generation Loop
mermaid1graph TD2 A[Retrieve 3 Relevant Chunks] --> B(Build Generation Prompt)3 B --> C(Generate 300 Words)4 C --> D(Validate Output)5 D --> E{Chapter Complete?}6 E -->|Yes| F[Update Metadata]7 E -->|No| B -
Context Management
- Dynamic summarization every 5 chapters
- Attention window reset protocol
-
Human Collaboration Interface
- Real-time editing with version control
- Multi-dimensional visualization:
- Character relationship graphs
- Emotional arc timelines
- Location dependency trees
Technical Highlights
-
Performance Optimization
- Quantized models (GGUF format) for CPU execution
- Async generation with Celery workers
- Context-aware batch processing
-
Validation Suite
- Automated tests:
python1def test_mood_consistency():2 analyzer = MoodValidator()3 assert analyzer.check_chapter(chapter3) > 0.85
- Human evaluation rubric (5-point scale)
- Automated tests:
-
Deployment Architecture
- Dockerized microservices
- Redis-backed task queue
- React/WebSocket frontend
Why This Approach Works
-
Balanced Creativity
- AI generates raw content
- RAG enforces narrative rules
- Humans guide artistic direction
-
Scalable Foundation
- Modular components allow:
- Model swapping (e.g., Claude 3 for DeepSeek)
- Database migration (Chroma → Pinecone)
- Style transfer plugins
- Modular components allow:
-
Cost Efficiency
- Local execution avoids API fees
- Quantization enables consumer GPU use
Practical Applications
- Automated Storyboarding
- Personalized Content Generation
- Interactive Fiction Prototyping
- Therapeutic Narrative Construction
Guide Roadmap
This introduction precedes a detailed technical walkthrough covering:
- Local model deployment with Ollama
- ChromaDB schema design patterns
- LangChain recursive chain construction
- React visualization techniques
- Performance benchmarking strategies
The system demonstrates how modern AI components can be orchestrated into creative pipelines while maintaining technical rigor—perfect for developers exploring the intersection of generative AI and traditional storytelling.
python1# --------------------------2# Backend Implementation3# --------------------------45# image_analysis.py6from pydantic import BaseModel7import requests8from PIL import Image9import io1011class ImageAnalysis(BaseModel):12 setting: str13 characters: list[str]14 mood: str15 objects: list[str]16 potential_conflicts: list[str]1718class MultimodalAnalyzer:19 def __init__(self, model="llava"):20 self.model = model2122 def analyze(self, image_path):23 if self.model == "llava":24 return self._analyze_with_llava(image_path)25 else:26 return self._analyze_with_gpt4v(image_path)2728 def _analyze_with_llava(self, image):29 prompt = """Describe this image in JSON format with:30 setting, characters, mood, objects, and potential_conflicts"""3132 # Implementation for Ollama LLaVA API call33 response = ollama.generate(34 model="llava",35 prompt=prompt,36 images=[image],37 format="json"38 )39 return ImageAnalysis.parse_raw(response.text)4041# --------------------------42# RAG & Story Generation43# --------------------------4445# rag_manager.py46import chromadb47from langchain.text_splitter import RecursiveCharacterTextSplitter4849class NarrativeRAG:50 def __init__(self):51 self.client = chromadb.PersistentClient(path="./chroma_db")52 self.collection = self.client.get_or_create_collection("narrative")53 self.text_splitter = RecursiveCharacterTextSplitter(54 chunk_size=500,55 chunk_overlap=5056 )5758 def index_context(self, document: dict, metadata: dict):59 chunks = self.text_splitter.split_text(document)60 ids = [str(uuid.uuid4()) for _ in chunks]61 self.collection.add(62 documents=chunks,63 metadatas=[metadata]*len(chunks),64 ids=ids65 )6667 def retrieve_context(self, query, k=3):68 results = self.collection.query(69 query_texts=[query],70 n_results=k71 )72 return [doc for doc in results['documents'][0]]7374# --------------------------75# LLM Story Generation76# --------------------------7778# story_generator.py79from langchain.chains import LLMChain80from langchain.prompts import PromptTemplate8182class StoryEngine:83 def __init__(self):84 self.llm = Ollama(model="deepseek-llm:70b")85 self.rag = NarrativeRAG()8687 def generate_chapter(self, context):88 retrieved = self.rag.retrieve_context(context["latest_summary"])89 prompt = self._build_prompt(context, retrieved)9091 chapter = self.llm.generate(prompt)92 self._validate_chapter(chapter)93 self._update_rag(chapter)9495 return chapter9697 def _build_prompt(self, context, retrieved):98 return f"""99 Write a 300-word story chapter continuing from:100 {context['summary']}101102 Retrieved Context:103 {retrieved}104105 Requirements:106 - Maintain {context['mood']} tone107 - Advance conflicts: {', '.join(context['conflicts'])}108 - End with a cliffhanger109 """110111 def _validate_chapter(self, chapter):112 # Custom validation logic113 if len(chapter.split()) < 250:114 raise ValueError("Chapter too short")115116 def _update_rag(self, chapter):117 self.rag.index_context(118 document=chapter,119 metadata={120 "chapter": context["current_chapter"],121 "keywords": extract_keywords(chapter)122 }123 )124125# --------------------------126# Frontend Components127# --------------------------128129// story_editor.jsx130import ReactFlow, { Controls } from 'reactflow';131import { useStore } from './store';132133export default function NarrativeGraph() {134 const nodes = useStore(state => state.nodes);135 const edges = useStore(state => state.edges);136137 return (138 <ReactFlow139 nodes={nodes}140 edges={edges}141 fitView142 >143 <Controls />144 </ReactFlow>145 );146}147148// --------------------------149# Deployment & Orchestration150# docker-compose.yml151version: '3.8'152153services:154 backend:155 build: ./backend156 ports:157 - "8000:8000"158 volumes:159 - ./data:/app/data160 depends_on:161 - redis162163 redis:164 image: redis:alpine165166 ollama:167 image: ollama/ollama168 ports:169 - "11434:11434"170 volumes:171 - ollama:/root/.ollama172173volumes:174 ollama:
Implementation Workflow:
- Image Processing Pipeline
python1# pipeline.py2class NarrativePipeline:3 def run(self, image_path):4 # Step 1: Image Analysis5 analyzer = MultimodalAnalyzer()6 analysis = analyzer.analyze(image_path)78 # Step 2: Initialize RAG9 rag = NarrativeRAG()10 rag.index_context(11 document=analysis.json(),12 metadata={"type": "initial_analysis"}13 )1415 # Step 3: Generate Story16 story = []17 summary = ""18 for chapter_num in range(1, 6):19 context = {20 "current_chapter": chapter_num,21 "summary": summary,22 "mood": analysis.mood,23 "conflicts": analysis.potential_conflicts24 }2526 chapter = StoryEngine().generate_chapter(context)27 story.append(chapter)2829 if chapter_num % 5 == 0:30 summary = self._summarize_story(story[-5:])3132 return story3334 def _summarize_story(self, chapters):35 summary_prompt = "Summarize this story arc in 3 sentences:"36 return ollama.generate(37 model="deepseek-llm:70b",38 prompt=summary_prompt + "\n".join(chapters)39 )
Directory Structure
text1.2├── backend/3│ ├── api/4│ │ ├── routers/5│ │ │ └── story.py6│ ├── core/7│ │ ├── image_analysis.py8│ │ └── story_generation.py9│ └── workers/10│ └── celery_tasks.py11├── frontend/12│ ├── public/13│ └── src/14│ ├── components/15│ │ ├── StoryEditor.jsx16│ │ └── NarrativeGraph.jsx17│ └── stores/18│ └── useStore.js19├── models/20│ └── schemas.py21└── infrastructure/22 ├── docker-compose.yml23 └── nginx.conf
Key Implementation Details:
- Context-Aware Generation
- Uses sliding window attention with summary injection
- Dynamic prompt construction based on RAG results
- Automatic conflict escalation through recursive feedback
- Optimized Retrieval
python1# Hybrid search implementation2def retrieve_context(self, query):3 return self.collection.query(4 query_texts=[query],5 where={"chapter": {"$gte": current_chapter-3}},6 n_results=37 )
- Validation Layer
python1# validation.py2from pydantic import BaseModel, validator34class ChapterValidation(BaseModel):5 content: str6 mood_score: float7 conflict_count: int89 @validator('mood_score')10 def check_mood_consistency(cls, v):11 if v < 0.7:12 raise ValueError("Mood consistency too low")13 return v
Performance Optimization:
python1# quantization.py2from llama_cpp import Llama34llm = Llama(5 model_path="deepseek-70b.Q4_K_M.gguf",6 n_ctx=4096,7 n_gpu_layers=408)
Testing Suite
python1# test_rag.py2def test_retrieval_relevance():3 rag = NarrativeRAG()4 rag.index_context("Test document", {"test": True})5 results = rag.retrieve_context("test query")6 assert len(results) == 17 assert "Test document" in results
This implementation provides:
- End-to-end narrative generation from images
- Context-aware continuation using RAG
- Self-correcting validation layer
- Scalable architecture with Docker
- Interactive visualization frontend
- Comprehensive testing suite
To run:
bash1docker-compose up --build2curl -X POST -F "image=@cat.jpg" http://localhost:8000/generate
The system balances creative generation with technical rigor through:
- Multimodal input processing
- Contextual memory management
- Automated quality control
- Human-in-the-loop editing
- Scalable infrastructure design

Sovereign AI: Building Local-First Intelligent Systems
by Daniel Kliewer · Paperback · 72 pages
The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.