Building a Multimodal Story Generation System
•3 min read
AIMultimodalStory GenerationPythonLLM

Multimodal Story Generation System
Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.
Features
- 🖼️ Image Analysis - Extract narrative elements from images using LLaVA
- 📖 Adaptive Story Generation - Generate 5-chapter stories with Gemma2-27B
- 🧠 Context Awareness - Maintain narrative consistency with ChromaDB RAG
- 📊 Interactive Visualization - ReactFlow-powered story graph interface
- 🚀 Production Ready - Dockerized microservices architecture
Table of Contents
Quick Start
Local Development Setup
-
Clone Repository
Bashgit clone https://github.com/kliewerdaniel/ITB02 cd ITB02 -
Create Virtual Environment
Bashpython -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows -
Install Dependencies
Bashpip install -r requirements.txt # Apple Silicon Special Setup pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu brew install libjpeg webp -
Initialize AI Models
Bashollama pull gemma2:27b ollama pull llava -
Start Services
Bash# Backend (FastAPI) uvicorn backend.main:app --reload # Frontend (new terminal) cd frontend npm install && npm run dev -
Verify Installation
Bashcurl http://localhost:8000/health # Expected response: {"status":"healthy"}
System Requirements
- Python 3.11+
- Node.js 18+
- Ollama runtime
- 16GB RAM (24GB+ recommended for GPU acceleration)
- 10GB+ Disk Space
Architecture
JavaScript[Frontend] ←HTTP→ [FastAPI] ↓ ↑ [Ollama] ←→ [ChromaDB] ↓ [Redis] ↓ [Celery Workers]
Key Components
| Component | Technology Stack | Function |
|---|---|---|
| Image Analysis | LLaVA, Pillow | Visual narrative extraction |
| Story Engine | Gemma2-27B, LangChain | Context-aware chapter generation |
| Knowledge Base | ChromaDB | Narrative consistency management |
| API Layer | FastAPI | REST endpoint management |
| Visualization | ReactFlow, Zustand | Interactive story mapping |
Production Deployment
Docker Setup
Bash# Build and launch all services docker-compose up --build # Initialize vector store docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"
Cluster Configuration
YAML# docker-compose.yml excerpt services: ollama: deploy: resources: limits: memory: 12G cpus: '4'
Troubleshooting
Common Issues
-
Missing Vector Store
Bashrm -rf chroma_db && mkdir chroma_db -
Out-of-Memory Errors
Bashexport OLLAMA_MAX_LOADED_MODELS=2 -
CUDA Compatibility Issues
Bashpip uninstall torch pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
Daniel Kliewer
GitHub Profile
AI Systems Developer
Related Articles
Loading related articles...