·126 min

Vibe Coding Session Building a Local LLM-Powered Knowledge Graph

A vibe coding session exploring the creation of a local LLM-powered personal knowledge graph system using Next.js, FastAPI, and NetworkX.

DK

Daniel Kliewer

Author, Sovereign AI

Vibe CodingLLMKnowledge GraphLocal AINext.jsFastAPI
Sovereign AI book cover

From the Book

This is from Sovereign AI: Building Local-First Intelligent Systems.

Get the Book — $88
Vibe Coding Session Building a Local LLM-Powered Knowledge Graph

Image

Brainstorming

Today I'm starting my vibe coding session with a full-on vibe for the brainstorming prompt below. I keep it fairly vague so that I can get a feel for what kind of things it will come up with. I'm going to try to keep it local and I'm building a graph. Let's see what today brings!

what are some vibe coding projects which are related to LLMs about building a graph, I want to build a graph, I want to vibe code, I want a blog post to be created about the whole thing, but I am going to write the blog post, what I want from you are ideas on what to build and the technologies used and then I want you to formulate several options with the technologies listed and allow me to choose one. One requirement is that I want everything to be local, the databases and inference are all done locally

That prompt gave me a list of five ideas. I chose one that I liked and chatGPT even gave me some options which I gave in the following prompt along with the following:


Now I want in .md form a full description, architecture and everything else needed to know in order to fill the context for the generation of the prompt I am going to give to CLIne, so I want full output from you, you are not creating a prompt for CLIne but you are just writing in .md form the full description of every aspect you can fit into your context.

That outputted a document I went on to edit and include which is very long so I made it collapsable here:


Click to expand the full document

Mind Map AI — Full Project Specification

Project: Mind Map AI — LLM-powered Personal Knowledge Graph (All Local) Target: Local-only stack (Next.js frontend, FastAPI backend, local LLM, SQLite, NetworkX graph). Purpose: Convert notes/journals/markdown into a browsable, queryable, and editable knowledge graph; provide semantic search and visualization; all inference and storage stays local.


Table of Contents

  1. Overview & Goals
  2. User Stories & Flows
  3. High-Level Architecture
  4. Technology Choices (Rationale)
  5. Data Models & Storage Design
  6. LLM Strategy (Local Inference + Embeddings)
  7. API Design (FastAPI)
  8. Frontend (Next.js)
  9. Graph Processing & Transformation Logic
  10. Visualization Approach
  11. File Structure & Example Files
  12. Deployment / Local Dev Setup
  13. Testing & Validation Strategy
  14. Security & Privacy Considerations
  15. Performance & Scaling Notes
  16. Example Prompts & Extraction Templates
  17. CLIne Handoff Notes
  18. Stretch Goals / Extensions

1. Overview & Goals

What it does:

  • Accepts local markdown/text notes (or pasted text)
  • Uses a locally-hosted LLM to extract entities, concepts, relationships, and sentiment
  • Stores raw notes in SQLite, embeddings in a local vector store, and graph relationships in a NetworkX graph persisted to disk
  • Exposes an API for ingestion, querying, and editing
  • Frontend (Next.js) provides an interactive visualization and editor for nodes/edges and a semantic search UI

Constraints:

  • Everything local: inference, DB, vector store, UI served locally
  • Offline-capable development workflow where possible
  • Auditable transformations — every extraction stores source text and provenance

Primary users:

  • You (the developer / blogger) building and experimenting; audience for blog: fellow vibe coders

2. User Stories & Flows

User Stories:

  • As a user, I want to drop a folder of markdown into the app and have a graph generated automatically
  • As a user, I want to click on a node and see the source passages and the LLM's extraction/provenance
  • As a user, I want to semantically search my notes and get graph nodes as results
  • As a user, I want to edit nodes/edges manually and commit changes
  • As a user, I want exports: GraphML, GEXF, PNG snapshots

Typical Flow:

  1. Drop or upload notes/folder or paste text
  2. Backend reads files, extracts metadata, runs LLM extraction and embeddings
  3. Save raw text to SQLite, embeddings to local vector store (Chroma or local Faiss), create/append nodes & edges to NetworkX graph
  4. Frontend queries backend for graph and renders interactive visualization
  5. User inspects nodes, opens provenance panel with source text and extracted labels
  6. User edits a node/edge → backend updates NetworkX & SQLite
  7. User exports or runs graph analytics (connected components, centrality)

3. High-Level Architecture

text
1[ Next.js (frontend) ] <---> [ FastAPI (backend) ] <---> [Local LLM runtime (Ollama/Llama)]
2 |-- SQLite (raw notes + metadata)
3 |-- Vector DB (local Chroma / Faiss) (embeddings)
4 |-- NetworkX (graph persisted as .gpickle / GraphML)

Components:

  • Frontend: Next.js app (React). Interactive graph (react-cytoscapejs), note editor, search UI
  • Backend: FastAPI for ingestion, graph management, search endpoints, admin endpoints
  • LLM runtime: Ollama, Llama.cpp, or Dockerized local model backend (whichever you prefer). Used for extraction and for optional reasoning queries
  • Embeddings: local sentence-transformer model (e.g., all-MiniLM or similar) or Ollama embedding endpoint (local)
  • Graph persistence: NetworkX memory representation persisted to .gpickle / GraphML files, backed up in SQLite for quick metadata queries

4. Technology Choices (Rationale)

  • Next.js: you're familiar with it; great for building modern UIs, server-side rendering for initial page load; can run entirely locally with next dev or next start
  • FastAPI: lightweight, async, great for building REST APIs; easy to integrate with Python graph code and LLM libraries
  • NetworkX: excellent for in-memory graph algorithms and flexible node/edge attributes; easy persistence to gpickle or GraphML
  • SQLite: simple, file-based database for raw text and provenance; ACID, portable
  • Local LLM (Ollama / Llama): keeps inference local. Ollama provides an easy local server experience; alternatives: llama.cpp or locally run Mistral/Gemma via supported runtimes
  • Embeddings: local sentence-transformers or Ollama embeddings. Useful for fast semantic search
  • Vector DB: lightweight local Chroma or Faiss if you want faster vector search than scanning SQLite
  • Visualization: Cytoscape (via react-cytoscapejs) — good UX for graph exploration

5. Data Models & Storage Design

SQLite Schema (Simplified):

sql
1-- notes table: raw source markdown / text
2CREATE TABLE notes (
3 id INTEGER PRIMARY KEY AUTOINCREMENT,
4 filename TEXT,
5 content TEXT,
6 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
7 source_path TEXT, -- original path on disk if uploaded
8 hash TEXT, -- content hash for dedup
9 processed BOOLEAN DEFAULT 0
10);
11
12-- extracts table: store entity extracts & provenance
13CREATE TABLE extracts (
14 id INTEGER PRIMARY KEY AUTOINCREMENT,
15 note_id INTEGER REFERENCES notes(id),
16 extractor_model TEXT,
17 extract_json TEXT, -- store raw JSON output from LLM (entities, relationships)
18 score REAL,
19 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
20);
21
22-- metadata table (optional)
23CREATE TABLE metadata (
24 key TEXT PRIMARY KEY,
25 value TEXT
26);

NetworkX Graph Model:

  • Node attributes:

    • id (unique string; e.g., node:UUID or entity:<normalized_text>)
    • label (display name)
    • type (concept, person, place, idea, event, passage)
    • provenance (list of (note_id, span_start, span_end) tuples)
    • embedding (optional: vector; not stored directly in NetworkX but in vector DB with node id)
    • created_at, updated_at
  • Edge attributes:

    • type (related_to, causes, elaborates, contradicts, similar_to, part_of)
    • weight (confidence score)
    • extraction_id (id in extracts table)
    • provenance (source spans)

Persistence:

  • Save NetworkX to disk: nx.write_gpickle(G, 'graph.gpickle') or nx.readwrite.gexf.write_gexf(G, path) for export

6. LLM Strategy (Local Inference + Embeddings)

Roles for LLM:

  1. Extraction — Given a text block, extract:

    • Entities (nouns, named entities)
    • Concepts (abstract ideas)
    • Relationships between entities/concepts with relation types and confidence
    • Short summaries for nodes or passages
    • Sentiment or metadata tags (mood, importance)
  2. Normalization — Normalize entity names (e.g., "AI", "artificial intelligence" → canonical node)

  3. Reasoning / Querying — Answer user questions by walking the graph and using the LLM to generate synthesis from node contents

  4. Rewrite / Summarize — Generate node summaries for UI display

Extraction Prompt Pattern:

  • Provide short instructions to extract JSON with a strict schema
  • Include examples
  • Ask model to return only JSON (machine-readable)

Example Expected JSON:

json
1{
2 "nodes": [
3 {"label": "sleep", "type": "concept", "span": [120, 170], "confidence": 0.95},
4 {"label": "work", "type": "activity", "span": [0, 15], "confidence": 0.9}
5 ],
6 "edges": [
7 {"source": "sleep", "target": "work", "type": "affects", "confidence": 0.87}
8 ],
9 "summary": "This passage mentions that sleep affects work energy..."
10}

Embeddings:

  • Use a local sentence-transformer model to embed each note and node label for semantic search
  • Store vectors in local Chroma/Faiss, keyed by node id or note id

7. API Design (FastAPI)

Core Endpoints:

  • POST /api/ingest/file — upload a file or zip of markdown files
  • POST /api/ingest/text — post a text block for processing
  • GET /api/notes — list notes
  • GET /api/notes/{id} — get single note + extracts
  • POST /api/graph/build — force rebuild graph from extracts
  • GET /api/graph — get full graph or paginated
  • GET /api/graph/node/{id} — get node details + provenance
  • POST /api/graph/node — add/edit node
  • POST /api/graph/edge — add/edit edge
  • POST /api/search/semantic — body: {"q": "...", "top_k": 10}
  • GET /api/export/graph — returns GraphML / GEXF / gpickle
  • POST /api/query/llm — run a custom LLM prompt (local) — gated

Example Ingestion Workflow:

  1. POST /api/ingest/text with {"filename": "morning.md", "content": "I slept poorly..."}
  2. Backend saves to notes, returns note_id
  3. Backend calls extractor.process_note(note_id) which:
    • runs LLM extraction
    • writes extracts row
    • updates NetworkX nodes & edges
    • indexes embeddings
  4. Frontend polls GET /api/notes/{id} to check processed flag and show results

8. Frontend (Next.js)

Pages:

  • / — Dashboard / quick summary and recent notes
  • /graph — Full-screen interactive graph viewer
  • /note/[id] — Note viewer + extraction provenance + edit controls
  • /search — Semantic search interface
  • /settings — LLM settings, model selection, embedding model, import/export

Key Components:

  • GraphCanvas — react-cytoscapejs wrapper with pan/zoom, node click handlers
  • NodeDetailsPanel — shows node metadata, provenance passages, edit buttons
  • NoteUploader — drag & drop or folder selection
  • SemanticSearchBox — search input with results mapped to nodes/notes
  • ModelControl — choose local LLM / embeddings model, configure params

UX Interactions:

  • Double-click node → open NodeDetailsPanel with source passages highlighted
  • Right-click node → context menu: merge nodes, export node, delete node
  • Lasso select → group operations
  • Inline edit → on save, PATCH to /api/graph/node

9. Graph Processing & Transformation Logic

Extraction Pipeline (per note):

  1. Read note content and optionally split into passages (by paragraphs or sliding window)
  2. For each passage:
    • Send to LLM extraction prompt (strict JSON output)
    • Receive nodes & edges list, normalize labels
    • Assign node IDs based on normalization (e.g., slugify + checksum)
  3. Merge nodes:
    • If normalized label already exists, merge provenance and update attributes (increment counts, update last_seen)
  4. Create/Update edges:
    • Attach extraction_id and confidence
  5. Store extracts and update notes.processed = TRUE
  6. Index embeddings for note and nodes

Normalization Heuristics:

  • Lowercase normalization + stopword stripping for short labels
  • Use model to provide canonical name suggestion and disambiguation (LLM can propose canonical forms; store as canonical_label)
  • Keep alias list on node attributes

Conflict Resolution:

  • Keep original extraction raw store
  • On conflicting edges (contradictory relations), create contradiction edge type or attach contradiction attribute with evidence list

10. Visualization Approach

Recommendation: Use react-cytoscapejs or cytoscape with cose or cola layout.

Key Visual Cues:

  • Node color by type (concept, person, event)
  • Node size by centrality (degree or eigenvector centrality)
  • Edge thickness by weight (confidence)
  • Hover tooltip shows top 1-2 provenance excerpts
  • Click to open panel with full provenance + raw extract JSON + ability to edit

Performance:

  • For large graphs, implement lazy loading and clustering. Only render subgraph around selected node by default (e.g., BFS to depth 2)
  • Provide client-side search that requests filtered nodes from backend

11. File Structure & Example Files

text
1mindmap-ai/
2├─ backend/
3│ ├─ app/
4│ │ ├─ main.py # FastAPI app
5│ │ ├─ api/
6│ │ │ ├─ ingest.py
7│ │ │ ├─ graph.py
8│ │ │ ├─ search.py
9│ │ ├─ services/
10│ │ │ ├─ extractor.py # LLM extraction logic
11│ │ │ ├─ embeddings.py
12│ │ │ ├─ graph_store.py # NetworkX wrapper + persistence
13│ │ ├─ db/
14│ │ │ ├─ schema.sql
15│ │ │ ├─ db.py # sqlite connection functions
16│ ├─ requirements.txt
17│ ├─ Dockerfile
18├─ frontend/
19│ ├─ package.json
20│ ├─ next.config.js
21│ ├─ src/
22│ │ ├─ pages/
23│ │ │ ├─ index.js
24│ │ │ ├─ graph.js
25│ │ │ ├─ note/[id].js
26│ │ ├─ components/
27│ │ │ ├─ GraphCanvas.jsx
28│ │ │ ├─ NodePanel.jsx
29│ │ │ ├─ SearchBox.jsx
30│ ├─ Dockerfile
31├─ models/ # local LLM or pointers to models
32├─ data/
33│ ├─ notes/ # sample markdown files
34│ ├─ graph.gpickle
35│ ├─ vectors/ # vector DB files (Chroma/Faiss)
36└─ README.md

12. Deployment / Local Dev Setup

Development Steps (Summary):

  1. Install Python 3.10+ and Node 18+
  2. Backend:
    • cd backend
    • python -m venv .venv && source .venv/bin/activate
    • pip install -r requirements.txt
    • Setup SQLite DB: run app/db/schema.sql
    • Configure local LLM endpoint in app/config.py (e.g., http://localhost:11434 for Ollama)
    • uvicorn app.main:app --reload --port 8000
  3. Frontend:
    • cd frontend
    • npm install
    • npm run dev (by default http://localhost:3000)
  4. LLM:
    • Start Ollama or other local LLM runtime with the chosen model
  5. Try /api/ingest/text via Postman or frontend uploader

Docker (Optional):

  • Provide docker-compose with three services:
    • frontend (Next.js)
    • backend (FastAPI)
    • local LLM runtime (if using a docker-friendly image)
    • Volume mount ./data and ./models

13. Testing & Validation Strategy

Unit Tests:

  • Test SQLite insert/read operations
  • Test NetworkX persistence and loading
  • Test extractor.parse_output function with sample JSON outputs (simulate LLM)

Integration Tests:

  • Ingest sample markdown → run extraction → assert nodes count, edge count stable
  • Semantic search correctness: query fixture questions and check expected node returns

Manual QA:

  • Use a small set of notes with known relationships and ensure extraction and normalization produce expected outputs

14. Security & Privacy Considerations

  • Everything local — no remote calls unless explicitly configured (e.g., to an optional cloud LLM). Default config should disable external network
  • Raw notes stored in SQLite; consider encrypting the DB for extra privacy (e.g., using filesystem-level encryption or libs)
  • LLM sandboxing: if using containerized LLM, ensure it's not exposed outside localhost
  • Sanitize inputs to prevent injection-like threats into the backend shell or file system

15. Performance & Scaling Notes

  • For many notes (thousands), NetworkX in-memory may become heavy. Strategies:
    • Shard graph by topic or file
    • Use persistent graph DB (Neo4j) as an upgrade path
    • Vector search: Faiss or Chroma with on-disk indexes recommended for large corpora
    • Batch extractions: process notes in parallel but throttle LLM calls to avoid resource exhaustion

16. Example Prompts & Extraction Templates

Strict JSON Extractor Prompt (Short):

text
1System: You are a JSON extractor. Receive a short passage and return a JSON with nodes, edges, and summary. Return only valid JSON, nothing else. Use the schema below.
2
3{
4 "nodes": [{"label":..., "type":..., "span":[start,end], "confidence":float}],
5 "edges": [{"source": "label_or_id", "target":"label_or_id", "type":"affects|relates_to|contradicts", "confidence":float}],
6 "summary":"one-sentence summary"
7}

Example Instruction Body for Model:

text
1Passage:
2"""
3I haven't been sleeping well, which makes my work energy low and irritability higher. I want to improve exercise and sleep routine.
4"""
5
6Return JSON following schema: nodes: detect "sleep", "work energy", "irritability", "exercise", their types (concept/activity), edges such as sleep -> work energy (affects), include span character indexes and confidence scores between 0 and 1.

Normalization Prompt (if using LLM to canonicalize):

  • Provide candidate aliases and ask model to choose canonical label and provide justification

17. CLIne Handoff Notes

What to give CLIne later:

  • The full project README (this document)
  • Preferred languages: Python (FastAPI), JS/TS (Next.js)
  • Test data: a small data/notes/ folder with 4–6 markdown files exhibiting overlapping concepts (to validate dedup and merging)
  • Specify "All local" requirement and that LLM MUST be local; provide model preference (e.g., llama-3 via Ollama)
  • Ask for:
    • Implementation of the API endpoints described
    • Basic Next.js frontend with GraphCanvas & NodePanel
    • A minimal extraction prompt (as provided) and an extractor harness that can be swapped for different LLM endpoints easily
    • Provide acceptance tests:
      • Ingest sample notes and produce at least N nodes and M edges (numbers based on sample)
      • Export GraphML and confirm at least one node with provenance exists

18. Stretch Goals / Extensions

  • Graph analytics dashboard: centrality, communities (Louvain), timeline of nodes by created_at
  • Versioning & diffs: maintain history of node edits and allow rollback
  • Local fine-tuning: fine-tune an LLM locally on your own notes for improved extraction
  • Sync to Obsidian or local vault: keep files in sync
  • Biometric integration: pair node tags with daily metrics (sleep HR from device) — for the journaling use case
  • Export to Neo4j for larger-scale graph storage or use as a migration path

Appendix — Sample Code Snippets

FastAPI Ingestion Skeleton (Illustrative):

python
1# backend/app/api/ingest.py
2from fastapi import APIRouter, HTTPException
3from pydantic import BaseModel
4from ..services.extractor import process_text
5from ..db.db import insert_note
6
7router = APIRouter()
8
9class IngestRequest(BaseModel):
10 filename: str
11 content: str
12
13@router.post("/text")
14async def ingest_text(payload: IngestRequest):
15 note_id = insert_note(payload.filename, payload.content)
16 # process in background or synchronous depending on config:
17 result = process_text(note_id, payload.content) # calls LLM
18 return {"note_id": note_id, "result": result}

NetworkX Persistence Example:

python
1import networkx as nx
2G = nx.Graph()
3G.add_node("sleep", type="concept", label="sleep")
4G.add_node("work", type="activity", label="work energy")
5G.add_edge("sleep", "work", type="affects", weight=0.95)
6nx.write_gpickle(G, "data/graph.gpickle")
7# load:
8G2 = nx.read_gpickle("data/graph.gpickle")

Example LLM Call (Pseudo):

python
1def call_local_llm(prompt: str) -> dict:
2 # Example using requests to an Ollama-like local endpoint
3 import requests
4 r = requests.post("http://localhost:11434/api/text", json={"prompt": prompt})
5 return r.json()

Final Notes

  • This project is perfect for vibe-coding: incremental wins (drop a note → see a node), clear visuals (graph grows as you feed it), and deep future-proofing (persisted graph + raw extracts)
  • Keep everything auditable — that'll make your blog narrative strong: "I fed my journal to a local LLM and watched my mind's topology appear"
  • Start small: one FastAPI endpoint + one Next.js page with a small sample Markdown folder. Graduate to background processing and better UI after you confirm extraction quality


Now I am going to open a new folder in VSCode and start a new project. In the root I am creating a file called spec.md and I am going to paste the above into it.


Image


Generating the CLIne Prompt


The following prompt I am going to create will generate a master CLIne builder prompt which will establish the full rules and workflow for the project.

I guess this is part of "context engineering" basically just picking and choosing, editing and drafting the documents that outline what you want to vibe code.


I want you to generate the CLIne builder prompt in markdown, I want to save it to the root and be able to reference it so that I can list many iterative prompts which will be executed in sequence to complete the project, I want all the prompts to be fully formed to be able to be given to CLIne individually for each incremental improvement. I want deliverables, thresholds for completion in the prompts. I want each prompt to have a checkbox that is checked when completed as well as instructions to create a set of files to keep track of everything. These files include a checklist.md which includes each and every series of instructions which need to be followed in order to complete the application. I want .md files generated to aid in the project. I first want the documentation to be created for the application which will include vast and detailed documentation of the API, the functions, architecture, technologies, security, CI/CD, devops, deployment, testing, design as well as each and every other department within a tech company you can think of. I want all of these documents to be referenced in each and every CLIne prompt so that they are able to keep track, edit and adapt the application based on changes to these documents, the documents themselves will help aid in observability and help the model keep track of changes and needed improvements.

The purpose of this prompt is to build a basic workflow structure that CLIne can follow along and modify according to the project specs.

I once created a template repo for something similar to this ages ago in my Vibe Coding career which was helpful here

But now I create the same set of documents in this method we are doing today as I described in the previous prompt.

The previous prompt generated the below output. I save it to the root folder along with the specs as cline_master_prompt.md


text
1# CLINE_MASTER_PROMPT.md
2
3# Mind Map AI — CLIne Master Builder Prompt
4
5---
6
7## 1. Project Overview
8
9**Project Name:** Mind Map AI
10**Purpose:** Build a fully local, LLM-powered personal knowledge graph system that ingests text or markdown notes, extracts entities, relationships, and concepts, stores them in a NetworkX graph with SQLite for provenance, and provides an interactive Next.js frontend for exploration and editing.
11
12**Core Goals:**
13- Entirely local: LLM inference, database, and graph processing are all on-device.
14- Incremental iterative development via CLIne prompts.
15- Maintain comprehensive documentation that guides development and adapts with project changes.
16- Track every step, task, and deliverable in a `checklist.md` for observability and iterative progress.
17
18---
19
20## 2. Technologies & Architecture
21
22**Frontend:** Next.js (React), react-cytoscapejs for graph visualization
23**Backend:** FastAPI (Python)
24**Graph Processing:** NetworkX, persisted with `.gpickle` / GraphML
25**Database:** SQLite for raw text, metadata, and provenance
26**LLM:** Local model (Ollama, LLaMA, or similar)
27**Vector Embeddings:** Local sentence-transformers or Chroma/Faiss
28**Visualization:** Cytoscape.js via React wrapper
29**Optional DevOps Tools:** Docker for local containers, Git for version control
30
31**High-Level Architecture Diagram:**
32
33[ Next.js Frontend ] <—> [ FastAPI Backend ] <—> [Local LLM Runtime]
34|– SQLite (notes + extracts)
35|– NetworkX Graph (.gpickle / GraphML)
36|– Vector DB (local Chroma/Faiss embeddings)
37
38---
39
40## 3. Documentation Framework
41
42All documentation should be generated by CLIne initially and updated incrementally as the project evolves. Documentation must live in `/docs/`.
43
44**Files to generate:**
45- `architecture.md` — diagrams, system overview, folder structure
46- `api-spec.md` — REST endpoint descriptions, request/response formats
47- `database.md` — SQLite schema, table definitions
48- `llm_prompting.md` — prompt schemas, extraction JSON format
49- `security.md` — authentication, API security, local privacy
50- `cicd_devops.md` — Docker setups, local dev environment, CI/CD notes
51- `testing.md` — unit, integration, end-to-end testing guidelines
52- `design_system.md` — UI/UX rules, graph visualization patterns
53- `roadmap.md` — planned features, incremental goals
54- `decisions.md` — track architectural and implementation choices
55- `changelog.md` — versioned project changes
56
57**Documentation Rules:**
58- Every CLIne prompt must reference `/docs/` files.
59- Changes to the system (API, DB, frontend) must be reflected in the corresponding documentation.
60- Documentation must include sample JSON, diagrams, code snippets, and usage instructions.
61- CLIne must create `.md` files even with placeholders for sections not yet implemented.
62
63---
64
65## 4. Checklist System (`checklist.md`)
66
67CLIne must maintain a master checklist that includes **all tasks, deliverables, thresholds, and progress**.
68
69**Checklist Format Example:**
70```markdown
71# Mind Map AI Checklist
72
73## Prompt 1: Initialize Repository & Documentation
74- [ ] Create folder structure (`/frontend`, `/backend`, `/docs`, `/data`, `/tests`)
75- [ ] Generate initial documentation templates in `/docs/`
76- [ ] Initialize `checklist.md` with this entry
77- [ ] Commit initial repo structure
78
79Rules for CLIne:
80 • Append each prompt and its tasks to checklist.md.
81 • Update checkboxes [ ] → [x] when tasks are complete.
82 • Reference documentation in each prompt to stay consistent with system changes.
83 • Include deliverables and completion thresholds in each prompt.
84
85
86
875. CLIne Workflow Rules
88 1. Documentation-First: CLIne must create and maintain /docs/ files before or alongside implementation.
89 2. Incremental Prompts: Each CLIne prompt must be fully executable on its own, without assumptions of external context, but referencing /docs/.
90 3. Deliverables & Thresholds: Each prompt must specify output files, expected behavior, and conditions for marking completion.
91 4. Observability: CLIne must log all actions, decisions, and modifications into decisions.md and changelog.md.
92 5. Version Control: CLIne must maintain clear commit points per prompt.
93 6. Local-Only Constraint: No remote LLM calls unless explicitly configured; default operation is fully offline.
94
95
96
976. Incremental CLIne Prompts
98
99✅ Prompt 1 — Initialize Repository & Documentation
100
101Objective: Set up project structure, create documentation templates, initialize checklist.md.
102
103Tasks:
104 • Create /frontend, /backend, /docs, /data, /tests.
105 • Generate documentation placeholders:
106 • /docs/architecture.md
107 • /docs/api-spec.md
108 • /docs/database.md
109 • /docs/llm_prompting.md
110 • /docs/security.md
111 • /docs/cicd_devops.md
112 • /docs/testing.md
113 • /docs/design_system.md
114 • /docs/roadmap.md
115 • /docs/decisions.md
116 • /docs/changelog.md
117 • Initialize checklist.md and add this prompt as entry.
118 • Commit initial structure to Git.
119
120Deliverables:
121 • Directory structure as above.
122 • Placeholder .md files in /docs/.
123 • Initialized checklist.md.
124
125Completion Threshold:
126 • All directories and placeholder docs exist.
127 • Checklist entry added.
128 • CLIne logs creation in decisions.md.
129
130
131
132✅ Prompt 2 — Backend Setup & SQLite Schema
133
134Objective: Initialize FastAPI backend and SQLite database with schema.
135
136Tasks:
137 • Create /backend/app/main.py with FastAPI instance.
138 • Define /backend/app/db/schema.sql with notes, extracts, metadata tables.
139 • Implement DB connection module (/backend/app/db/db.py).
140 • Add initial requirements.txt for Python dependencies.
141 • Add tasks to checklist.md referencing backend initialization.
142
143Deliverables:
144 • FastAPI project skeleton.
145 • SQLite schema created.
146 • DB connection module functional.
147
148Completion Threshold:
149 • FastAPI server runs locally without errors.
150 • SQLite database can be created and queried.
151 • Checklist entry updated [x] when complete.
152
153
154
155✅ Prompt 3 — NetworkX Graph Setup & Persistence
156
157Objective: Implement in-memory graph using NetworkX and persistence to disk.
158
159Tasks:
160 • Create /backend/app/services/graph_store.py managing NetworkX graph.
161 • Implement node and edge creation, update, deletion.
162 • Persist graph to .gpickle and GraphML.
163 • Include sample load and save scripts.
164 • Document graph storage in /docs/database.md and /docs/architecture.md.
165
166Deliverables:
167 • Fully functional NetworkX graph module.
168 • Sample persistence files.
169
170Completion Threshold:
171 • Graph can be saved and reloaded.
172 • Checklist entry updated.
173
174
175
176✅ Prompt 4 — LLM Extraction Module
177
178Objective: Implement local LLM integration for extracting nodes and edges from text.
179
180Tasks:
181 • Create /backend/app/services/extractor.py.
182 • Implement function to call local LLM with structured JSON output.
183 • Handle canonicalization of node labels.
184 • Write unit tests to validate extraction.
185 • Document JSON schema in /docs/llm_prompting.md.
186
187Deliverables:
188 • Extractor module functional.
189 • Test cases for extraction correctness.
190
191Completion Threshold:
192 • Given sample text, LLM produces valid JSON nodes/edges.
193 • Checklist updated.
194
195
196
197✅ Prompt 5 — Embeddings & Vector Store
198
199Objective: Add embeddings and semantic search.
200
201Tasks:
202 • Generate embeddings for notes and nodes using local sentence-transformer.
203 • Store vectors in local Chroma or Faiss DB.
204 • Implement semantic search API endpoint (/api/search/semantic).
205 • Update /docs/architecture.md with vector store design.
206
207Deliverables:
208 • Embedding module functional.
209 • Search API endpoint returns top-k results.
210
211Completion Threshold:
212 • Test queries return expected nodes.
213 • Checklist updated.
214
215
216
217✅ Prompt 6 — Frontend Graph Visualization
218
219Objective: Implement Next.js frontend for graph exploration.
220
221Tasks:
222 • Create /frontend/src/pages/graph.js.
223 • Implement GraphCanvas component using react-cytoscapejs.
224 • Node click opens details panel.
225 • Sync frontend with backend API.
226 • Document UI design in /docs/design_system.md.
227
228Deliverables:
229 • Interactive graph visualization.
230 • Node panel shows provenance and metadata.
231
232Completion Threshold:
233 • Graph renders sample NetworkX data.
234 • Node interactions functional.
235 • Checklist updated.
236
237
238
239✅ Prompt 7 — Note Ingestion UI
240
241Objective: Allow users to upload markdown/text notes.
242
243Tasks:
244 • Implement /frontend/src/components/NoteUploader.jsx.
245 • Connect uploader to backend /api/ingest/text.
246 • Validate uploads and display ingestion progress.
247 • Document in /docs/design_system.md.
248
249Deliverables:
250 • Upload UI functional.
251 • Successful ingestion updates graph.
252
253Completion Threshold:
254 • Upload sample notes → nodes/edges appear in graph.
255 • Checklist updated.
256
257
258
259✅ Prompt 8 — API Security & Local Privacy
260
261Objective: Ensure API and local data are secure.
262
263Tasks:
264 • Add input validation for API endpoints.
265 • Implement authentication if multi-user mode.
266 • Ensure database files and embeddings are only accessible locally.
267 • Document security measures in /docs/security.md.
268
269Deliverables:
270 • API endpoints protected.
271 • Local-only constraints enforced.
272
273Completion Threshold:
274 • No unauthorized access possible.
275 • Checklist updated.
276
277
278
279✅ Prompt 9 — Testing Infrastructure
280
281Objective: Add unit, integration, and end-to-end testing.
282
283Tasks:
284 • Implement /tests/backend/test_db.py, /test_extractor.py, /test_graph.py.
285 • Frontend tests for key components.
286 • Integrate pytest or similar.
287 • Document testing strategy in /docs/testing.md.
288
289Deliverables:
290 • Tests pass locally.
291 • Checklist updated.
292
293Completion Threshold:
294 • 90% of core backend functions covered.
295 • Checklist updated.
296
297
298
299✅ Prompt 10 — CI/CD & DevOps
300
301Objective: Prepare local CI/CD workflow and devops scripts.
302
303Tasks:
304 • Add Dockerfiles for frontend and backend.
305 • Add docker-compose.yml for local stack.
306 • Document setup in /docs/cicd_devops.md.
307 • Include instructions for starting full local environment.
308
309Deliverables:
310 • Working Docker Compose environment.
311 • CI/CD documentation complete.
312
313Completion Threshold:
314 • Entire stack runs locally with one command.
315 • Checklist updated.
316
317
318
3197. Rules for All Future CLIne Prompts
320 • Reference documentation /docs/ before making changes.
321 • Update checklist.md and decisions.md with every change.
322 • Provide clear deliverables and completion thresholds.
323 • Use incremental, isolated prompts for every new feature or fix.
324 • Maintain full local-only operation.
325 • Always generate `.md files for any new modules or updates.
326 - Each prompt must include `[ ]` checkboxes for each task and mark `[x]` when complete.
327 - CLIne must never overwrite existing files without backup or logging changes in `decisions.md`.
328 - Each prompt should assume the previous prompts have executed successfully and the checklist is up-to-date.
329
330---
331
332## 8. Folder Structure Reference
333
334/frontend
335/src
336/pages
337/components
338next.config.js
339/backend
340/app
341main.py
342/db
343db.py
344schema.sql
345/services
346graph_store.py
347extractor.py
348/api
349endpoints.py
350/data
351/graphs
352/embeddings
353/docs
354architecture.md
355api-spec.md
356database.md
357llm_prompting.md
358security.md
359cicd_devops.md
360testing.md
361design_system.md
362roadmap.md
363decisions.md
364changelog.md
365/tests
366backend
367test_db.py
368test_extractor.py
369test_graph.py
370frontend
371test_graph_ui.jsx
372checklist.md
373CLINE_MASTER_PROMPT.md
374
375---
376
377## 9. Deliverables Summary
378
379- **`CLINE_MASTER_PROMPT.md`** – master instructions for CLIne.
380- **`/docs/`** – full documentation suite.
381- **`/backend/app`** – backend FastAPI + NetworkX + LLM modules.
382- **`/frontend/src`** – Next.js frontend with graph visualization and note ingestion UI.
383- **`checklist.md`** – iterative task tracker with checkboxes, deliverables, and thresholds.
384- **`/data`** – persistent storage for SQLite, NetworkX graphs, and embeddings.
385- **`/tests`** – unit, integration, and end-to-end tests.
386- **Docker & DevOps files** – enable local reproducible environment.
387
388---
389
390## 10. Observability & Iterative Improvements
391
392- **Decisions & Rationale**: `decisions.md` tracks architectural decisions and their justification.
393- **Changelog**: `changelog.md` records versioned updates with date, author, and impact.
394- **Checklist Updates**: Every CLIne prompt must update `checklist.md` with tasks completed.
395- **Documentation Syncing**: CLIne must reference `/docs/` in every prompt to ensure consistency and update any affected sections.
396- **Error Handling & Logging**: Backend modules should log all errors to `/data/logs.log` and notify via console during local execution.
397
398---
399
400## 11. Incremental Development Philosophy
401
4021. **Documentation First** – Define architecture, API, DB schema, and LLM JSON formats before implementation.
4032. **Local-Only** – All processing, embeddings, and inference must remain on-device.
4043. **Modular Prompts** – Each CLIne prompt addresses a single subsystem or feature.
4054. **Observability** – All tasks, decisions, and changes must be logged in `/docs/` or `checklist.md`.
4065. **Checkpoints** – Deliverables and success thresholds must be explicitly stated for every task.
407
408---
409
410## 12. Example Future Prompts
411
412- **Prompt 11** – Interactive Graph Editing: Add node/edge create, edit, delete in frontend and backend.
413- **Prompt 12** – Semantic Search Enhancements: Advanced query filters, similarity scoring, and result ranking.
414- **Prompt 13** – Graph Export/Import: Support JSON, CSV, and GraphML for interoperability.
415- **Prompt 14** – UI Enhancements: Dark mode, node coloring, and cluster visualization.
416- **Prompt 15** – Multi-user Mode: Local authentication and personal graph spaces (optional).
417- **Prompt 16** – Advanced LLM Prompt Optimization: Modular prompt chaining and improved extraction accuracy.
418- **Prompt 17** – Backup & Recovery: Automated local graph and DB backup routines.
419- **Prompt 18** – Metrics & Observability Dashboard: Visualize ingestion rates, graph growth, extraction success.
420- **Prompt 19** – Local Plugin System: Allow CLIne to extend modules via local plugin scripts.
421- **Prompt 20** – Full Test Suite Integration: Coverage reports, CI scripts, and automated testing for all modules.
422
423---
424
425## 13. Notes for CLIne Execution
426
427- Each prompt is **self-contained but sequentially aware**.
428- All tasks require clear **deliverables** and a **success threshold**.
429- CLIne must maintain **full observability** through `/docs/`, `checklist.md`, and `decisions.md`.
430- Every prompt update should result in updated Markdown documentation reflecting the current system state.
431- Prompts should **encourage modular, local-first development** to ensure reproducibility and offline operation.
432
433---
434
435## 14. CLIne Execution Log

Ok, so I can tell that that is a flawed prompt. It is partially my fault. I need to refine it. You can see at the end of this how I messed it up in the full brainstorm chat for this here

So how should I do that? I think what I am going to do is use NotebookLM for this next part. I am first going to compile as much documentation as I can give to it. The purpose of this is to help construct the key documents for our vibe coding session. Feeding the right context to CLIne is what will make the difference between working and not.

One reason to create the spec.md file first is that often you can get away from your initial intentions so it is best to get as much detail down before going any further in the brainstorming session.

As you saw the context window for the model was eaten up by all the mistakes I made before the final prompt was generated and you see how it abberated from the inital spec. So I am going to use NotebookLM to help me write some more documentation to feed CLIne before we start vibe coding. So I can just entere a simple prompt after I have the two loaded like the following:


I want the prompts in cline_master_prompt.md to be rewritten according to the information in spec.md

So from that I got the below output which I am saving as prompt_add.md and putting in the root.


text
1### Mind Map AI — CLIne Master Builder Prompt
2
3--------------------------------------------------------------------------------
4
5#### 1. Project Overview
6**Project Name:** Mind Map AI — LLM-powered Personal Knowledge Graph (All Local)
7**Purpose:** Build a fully local system designed to **convert notes/journals/markdown into a browsable, queryable, and editable knowledge graph**. The system must ingest text or markdown notes, use a local LLM to extract entities, concepts, relationships, and sentiment, store them, and provide an interactive Next.js frontend for visualization and editing.
8
9**Core Goals:**
10* **Entirely local:** LLM inference, database (SQLite), vector store, and graph processing must all operate on-device and remain local.
11* **Auditable transformations:** Every extraction must store source text and provenance.
12* **Support Semantic Search:** Implement semantic search capability for notes and nodes using local vector embeddings.
13* **Interactive Editing:** Allow users to edit nodes/edges manually and commit changes.
14* Incremental iterative development via CLIne prompts.
15* Maintain comprehensive documentation that guides development and adapts with project changes.
16* Track every step, task, and deliverable in a `checklist.md` for observability and iterative progress.
17
18**Constraints:**
19* The system must be offline-capable where possible.
20* The LLM extraction must utilize a **strict JSON schema** defined in `llm_prompting.md`.
21
22
23
24--------------------------------------------------------------------------------
25
26#### 2. Technologies & Architecture
27The architecture is defined as an all-local stack.
28
29**Frontend:** Next.js (React), utilizing `react-cytoscapejs` for graph visualization.
30**Backend:** FastAPI (Python), serving ingestion, graph management, search, and admin endpoints.
31**Graph Processing:** NetworkX, representing the graph in memory.
32**Graph Persistence:** NetworkX persisted to `.gpickle` or `GraphML` files on disk.
33**Database:** SQLite for raw text, metadata, and provenance (source text/note data).
34**LLM:** Local model (Ollama, Llama.cpp, or similar Dockerized local model backend).
35**Vector Embeddings:** Local `sentence-transformers` model (e.g., all-MiniLM) or Ollama embedding endpoint.
36**Vector DB:** Lightweight local Chroma or Faiss is recommended for storing vectors, keyed by node ID or note ID.
37
38**High-Level Architecture Diagram (Detailed):**
39[ Next.js Frontend ] <—> [ FastAPI Backend (Python logic, NetworkX) ] <—> [Local LLM Runtime (Ollama/Llama)]
40|– SQLite (raw notes + extracts/provenance)
41|– NetworkX Graph (.gpickle / GraphML)
42|– Vector DB (local Chroma/Faiss embeddings, indexed by node/note ID)
43
44
45
46--------------------------------------------------------------------------------
47
48#### 3. Documentation Framework
49All documentation should be generated by CLIne initially and updated incrementally as the project evolves. Documentation must live in `/docs/`.
50
51**Files to generate and required content enhancements:**
52
53* **`architecture.md`**: Diagrams, system overview, folder structure, and rationale for technology choices (Next.js, FastAPI, NetworkX, SQLite, Local LLM).
54* **`api-spec.md`**: Detailed REST endpoint descriptions, request/response formats. Must define and specify the **Core Endpoints** including `/api/ingest/file`, `/api/ingest/text`, `/api/graph`, `/api/search/semantic`, and the mutation endpoints for nodes/edges.
55* **`database.md`**: SQLite schema, table definitions, and the detailed **NetworkX Graph Model** (Node attributes: `id`, `label`, `type`, `provenance`, `embedding`, `created_at`; Edge attributes: `type`, `weight`, `extraction_id`, `provenance`).
56* **`llm_prompting.md`**: Prompt schemas, including the **Extraction Prompt Pattern** (strict JSON output with examples) and the four primary **Roles for LLM** (Extraction, Normalization, Reasoning/Querying, Rewrite/Summarize).
57* **`cicd_devops.md`**: Local Dev Setup, including environment dependencies (Python 3.10+, Node 18+), setup steps for backend (venv, requirements, SQLite schema), frontend (npm install/dev), and configuration for the local LLM endpoint (e.g., Ollama at `http://localhost:11434`).
58* **`testing.md`**: Unit, integration, and end-to-end testing guidelines, covering tests for NetworkX persistence, SQLite operations, and the **Integration Test** flow (Ingest sample markdown → run extraction → assert nodes/edges count).
59* **`security.md`**: Authentication, API security, and confirmation that the default configuration disables external network calls and that raw notes are stored locally in SQLite.
60* **`design_system.md`**: UI/UX rules, including graph visualization patterns (Node color by type, Node size by centrality, Edge thickness by confidence), and key UX interactions (Double-click for provenance, Inline editing).
61* **`roadmap.md` / `decisions.md` / `changelog.md`**: Standard project tracking documentation.
62
63**Documentation Rules:**
64* Every CLIne prompt must reference `/docs/` files.
65* Changes to the system must be reflected in the corresponding documentation.
66* Documentation must include sample JSON (for LLM output), diagrams, code snippets, and usage instructions.
67* CLIne must create `.md` files even with placeholders for sections not yet implemented.
68
69
70
71--------------------------------------------------------------------------------
72
73#### 4. Checklist System (checklist.md)
74
75CLIne must maintain a master checklist that includes **all tasks, deliverables, thresholds, and progress**.
76
77---
78### DETAILED CRITICAL CHECKLIST TASKS
79
80The project progress must be tracked against the following phases: 0. Setup & Documentation, 1. Core API & Ingestion, 2. Extraction & Persistence, 3. Frontend & Visualization, and 4. Testing & Validation.
81
82#### Phase 0: Setup & Documentation
83| Task ID | Description | Deliverable / Threshold | Source |
84| :--- | :--- | :--- | :--- |
85| 0.1 | **Local Environment Setup** | Install Python 3.10+ and Node 18+. Create Python backend venv and install dependencies (`requirements.txt`). | |
86| 0.2 | **LLM Configuration** | Configure local LLM endpoint in `app/config.py`, specifying the LLM server (e.g., Ollama at `http://localhost:11434`). | |
87| 0.3 | **Documentation Initialization** | Generate initial versions of all 11 required documentation files in `/docs/`, including `architecture.md`, `api-spec.md`, and `llm_prompting.md`. | |
88| 0.4 | **Database Schema Setup** | Run `app/db/schema.sql` to initialize the SQLite database structure for raw notes, extracts, and metadata. | |
89| 0.5 | **Embeddings Setup** | Configure the backend to load the local sentence-transformer model (e.g., all-MiniLM) or configure the Ollama embedding endpoint. | |
90
91#### Phase 1: Core API & Ingestion
92| Task ID | Description | Deliverable / Threshold | Source |
93| :--- | :--- | :--- | :--- |
94| 1.1 | **Ingestion Endpoint (Text)** | Implement `POST /api/ingest/text` to accept content, save it to the SQLite notes table, and initiate the asynchronous processing workflow. | |
95| 1.2 | **Ingestion Endpoint (File)** | Implement `POST /api/ingest/file` to handle file uploads (single file or zip of markdown files). | |
96| 1.3 | **Graph Retrieval API** | Implement `GET /api/graph` (returns full graph or paginated results) and `GET /api/graph/node/{id}` (returns node details and provenance). | |
97| 1.4 | **Graph Export API** | Implement `GET /api/export/graph` to return the NetworkX graph persisted as GraphML, GEXF, or gpickle. | |
98| 1.5 | **Semantic Search API** | Implement `POST /api/search/semantic` which accepts a query `{"q": "..."}` and returns ranked nodes/notes based on local vector embeddings. | |
99| 1.6 | **Mutation Endpoints** | Implement `POST /api/graph/node` and `POST /api/graph/edge` to allow manual editing and committing changes to the NetworkX graph and updating corresponding SQLite entries. | |
100
101#### Phase 2: Extraction & Persistence
102| Task ID | Description | Deliverable / Threshold | Source |
103| :--- | :--- | :--- | :--- |
104| 2.1 | **LLM Extraction Harness** | Create the minimal extractor component that sends text to the local LLM runtime and strictly enforces the **JSON output schema** defined in `llm_prompting.md`. | |
105| 2.2 | **Core Ingestion Workflow** | Implement the full sequence within the backend: LLM extraction, writing extraction results to SQLite, updating/merging nodes/edges in NetworkX, and indexing vectors. | |
106| 2.3 | **Node Merging Logic** | Implement the logic to assign unique node IDs (based on normalization) and merge nodes that represent the same entity, ensuring the `provenance` list is updated correctly. | |
107| 2.4 | **Graph Persistence** | Implement periodic saving of the NetworkX graph using `nx.write_gpickle` to ensure state persistence across application restarts. | |
108| 2.5 | **Provenance Tracking** | Ensure every extracted node stores the full provenance (source text spans, `note_id`). | |
109
110#### Phase 3: Frontend & Visualization
111| Task ID | Description | Deliverable / Threshold | Source |
112| :--- | :--- | :--- | :--- |
113| 3.1 | **Frontend Setup** | Initialize the Next.js application, including the basic required pages: `/graph`, `/note/[id]`, `/search`, and `/settings`. | |
114| 3.2 | **GraphCanvas Component** | Create the `GraphCanvas` component using `react-cytoscapejs` that fetches graph data from `GET /api/graph` and implements basic pan/zoom functionality. | |
115| 3.3 | **Visualization Cues** | Apply initial visualization rules: Node color by type (`concept`, `person`), Node size by centrality, and Edge thickness by confidence score (weight). | |
116| 3.4 | **Node Details Panel** | Implement the `NodeDetailsPanel` component that displays node metadata, lists provenance passages, and provides edit buttons when a node is clicked. | |
117| 3.5 | **Provenance Interaction** | Implement the key UX interaction: Double-click a node to open the `NodeDetailsPanel` showing source passages. | |
118
119#### Phase 4: Testing & Validation
120| Task ID | Description | Deliverable / Threshold | Source |
121| :--- | :--- | :--- | :--- |
122| 4.1 | **Unit Test Suite** | Implement Unit Tests for NetworkX loading/persistence and SQLite read/write operations. | |
123| 4.2 | **Integration Test 1 (Ingestion)** | **Acceptance Test:** Ingest the provided sample notes folder (`data/notes/`) via `/api/ingest/file`. Assert that the process completes and the resulting NetworkX graph contains non-zero nodes (N) and edges (M). | |
124| 4.3 | **Integration Test 2 (Export)** | **Acceptance Test:** Implement and run `GET /api/export/graph`. Confirm the exported GraphML/gpickle file contains at least one node with a populated `provenance` attribute. | |
125| 4.4 | **Security Check** | Verify that the default configuration disables external network calls, ensuring the system remains entirely local. | |
126
127This continuation details the essential technical specifications for the Mind Map AI project, focusing on data models, LLM requirements, core endpoints, and visualization specifications, as required by `spec.md`.
128
129--------------------------------------------------------------------------------
130
131#### 5. Data Models & Storage Design
132
133The system utilizes SQLite for raw source text and metadata, and NetworkX for the graph structure. Persistence must use `nx.write_gpickle` or `nx.readwrite.gexf.write_gexf`.
134
135##### 5.1. NetworkX Graph Model
136
137The NetworkX graph must rigidly follow these attribute definitions:
138
139**Node Attributes:**
140* **id:** Unique string (e.g., `node:UUID` or `entity:<normalized_text>`).
141* **label:** The display name.
142* **type:** Categorization (e.g., `concept`, `person`, `place`, `idea`, `event`, `passage`).
143* **provenance:** A list of tuples referencing source data: `(note_id, span_start, span_end)`.
144* **embedding:** (Optional, reference to Vector DB) The vector, though the vector itself is usually stored in the local Vector DB and keyed by node id.
145* **created\_at, updated\_at**.
146* **alias list:** Should be kept on node attributes to aid normalization and merging.
147
148**Edge Attributes:**
149* **type:** Relationship category (e.g., `related_to`, `causes`, `elaborates`, `contradicts`, `similar_to`, `part_of`).
150* **weight:** Confidence score of the extraction.
151* **extraction\_id:** ID referencing the entry in the SQLite extracts table.
152* **provenance:** Source spans.
153
154##### 5.2. Normalization Heuristics
155The backend must implement logic to normalize entity names to ensure that different mentions (e.g., "AI," "artificial intelligence") map to a single canonical node. This process should utilize the LLM to propose canonical forms and disambiguation. When merging nodes, the `provenance` list must be correctly updated.
156
157--------------------------------------------------------------------------------
158
159#### 6. LLM Strategy & Extraction Pipeline
160
161The entire LLM strategy must utilize a local model (Ollama, LLaMA, or similar).
162
163##### 6.1. Roles for LLM
164The local LLM will serve four primary roles:
1651. **Extraction:** Extracting Entities, Concepts, Relationships (with relation types and confidence), Short summaries, and Sentiment/metadata tags from input text.
1662. **Normalization:** Normalizing entity names (e.g., choosing a canonical label).
1673. **Reasoning / Querying:** Answering user questions by synthesizing information from the graph.
1684. **Rewrite / Summarize:** Generating display-ready summaries for nodes.
169
170##### 6.2. Extraction Prompt Pattern
171The extraction process must utilize a **strict JSON schema**. The prompt must include short instructions, examples, and explicitly ask the model to return *only* machine-readable JSON.
172
173##### 6.3. Embeddings
174A local `sentence-transformer` model (e.g., all-MiniLM) or an Ollama embedding endpoint must be used to embed each note and node label for semantic search functionality. These vectors must be stored in a lightweight local vector store (Chroma or Faiss).
175
176--------------------------------------------------------------------------------
177
178#### 7. API Design: Core Endpoints
179
180The FastAPI backend must expose the following core endpoints:
181
182| HTTP Method | Endpoint | Description |
183| :--- | :--- | :--- |
184| `POST` | `/api/ingest/file` | Upload a file or zip of markdown files. |
185| `POST` | `/api/ingest/text` | Post a text block for asynchronous processing. |
186| `GET` | `/api/graph` | Retrieve the full graph or paginated results for visualization. |
187| `GET` | `/api/graph/node/{id}` | Retrieve specific node details and its provenance. |
188| `POST` | `/api/graph/node` | Add or edit a specific node (manual user intervention). |
189| `POST` | `/api/graph/edge` | Add or edit a specific edge (manual user intervention). |
190| `POST` | `/api/search/semantic` | Accepts `{"q": "..."}` and returns ranked nodes/notes based on local vector embeddings. |
191| `GET` | `/api/export/graph` | Returns the NetworkX graph in GraphML, GEXF, or gpickle format. |
192
193**Ingestion Workflow Requirement:** Upon successful ingestion via `/api/ingest/text`, the backend must perform the sequence: save note to SQLite, run LLM extraction, write extracts, update/merge nodes/edges in NetworkX, and index embeddings.
194
195--------------------------------------------------------------------------------
196
197#### 8. Frontend & Visualization Requirements
198
199The Next.js frontend must provide an interactive visualization and editing environment.
200
201##### 8.1. Key Components & Pages
202Critical pages include `/graph` (Full-screen interactive graph viewer) and `/note/[id]` (Note viewer + provenance). Key components are the **GraphCanvas** (`react-cytoscapejs` wrapper) and the **NodeDetailsPanel**.
203
204##### 8.2. Visualization Cues
205The visualization must use cues based on graph data:
206* **Node color:** Determined by node `type` (e.g., `concept`, `person`, `event`).
207* **Node size:** Determined by graph analytics results, specifically **centrality** (degree or eigenvector centrality).
208* **Edge thickness:** Determined by the edge `weight` (confidence score).
209
210##### 8.3. UX Interactions
211Key interactions must include:
212* **Double-click on a node:** Opens the **NodeDetailsPanel** showing source passages and LLM extraction provenance.
213* **Inline editing:** Allows users to manually edit nodes/edges, triggering a `PATCH` request to the backend mutation endpoints.
214
215##### 8.4. Performance
216For large graphs, the visualization should only render a subgraph around the selected node (e.g., BFS to depth 2) to maintain performance.
217
218--------------------------------------------------------------------------------
219
220#### 9. Testing & Validation (Acceptance Criteria)
221
222Testing must cover persistence, data operations, and extraction correctness.
223
224##### 9.1. Acceptance Tests (Critical Thresholds)
2251. **Ingestion/Extraction:** Successfully ingest the provided sample notes folder (`data/notes/`) and confirm the extraction process successfully produces non-zero nodes (N) and edges (M).
2262. **Provenance Check:** Run `GET /api/export/graph` and confirm the exported GraphML/gpickle file contains at least one node with a full `provenance` attribute.
227
228##### 9.2. Security Precondition
229The default configuration must explicitly disable external network calls, ensuring the system remains entirely local and offline-capable.

Now I am going to build my final version of the master builder prompt. But this time I am going to use Anthropic. I save them for the end. I use chatGPT to brainstorm and then when I want something really done well I use my valuable Anthropic free usages to generate final versions.

So for this generation I am going to use the app for Anthropic, which I have enabled a lot of MCP servers and have connected my github which allows me to reference the workflow.git I mentioned earlier with a template for creating the context for the propmt. You see, it serves as the one shot for the generation, that is the one example given to it, but I really should few shot it, that is include multiple examples instead of just one. I will do that later.

So the prompt I am using to generate the final version of our master_prompt_builder.md is:


text
1You are an expert in prompt engineering and are going to help me build a prompt for CLIne to vibe code an entire project.
2
3I am giving you spec.md which has the full details of the project. It is the most authoritative version and should be referenced first.
4
5cline_master_prompt.md is the current version of the master prompt. It is not perfect and needs to be improved.
6
7prompt_add.md is a file that contains additional information that should be included in the master prompt.
8
9I want you to use the attached workflow.git as a template for the context of the prompt.
10
11The deliverable you are producing is the final version of the prompt I am going to give to CLIne to that will instruct it as to each and everything it needs to do. This prompt should be long and detailed and should include all the information from the spec.md, cline_master_prompt.md, and prompt_add.md which is relevant to the task. They are in the root folder of the project and can be referenced rather than directly quoted in the prompt in order to conserver context.
12
13Be sure to research as much as is needed in order to provide the best possible prompt. Do not assume anything. Do not make anything up.

Well not really, we all know the best prompt builders are LLMs so instead I am going to use chatGPT one more time to rewrite my prompt I am going to use with Anthropic by using something like this to generate it:


text
1Rewrite the following prompt so that it includes all the standards of software engineering which will help me instruct the LLM to generate the best possible CLIne prompt for vibe coding. This is the prompt I have now which I want modified or augmented as necessary to include the best practices in software engineering. : You are an expert in prompt engineering and are going to help me build a prompt for CLIne to vibe code an entire project.
2
3I am giving you spec.md which has the full details of the project. It is the most authoritative version and should be referenced first.
4
5cline_master_prompt.md is the current version of the master prompt. It is not perfect and needs to be improved.
6
7prompt_add.md is a file that contains additional information that should be included in the master prompt.
8
9I want you to use the attached workflow.git as a template for the context of the prompt.
10
11The deliverable you are producing is the final version of the prompt I am going to give to CLIne to that will instruct it as to each and everything it needs to do. This prompt should be long and detailed and should include all the information from the spec.md, cline_master_prompt.md, and prompt_add.md which is relevant to the task. They are in the root folder of the project and can be referenced rather than directly quoted in the prompt in order to conserver context.
12
13Be sure to research as much as is needed in order to provide the best possible prompt. Do not assume anything. Do not make anything up.

So now I get something like this which is what I am going to give to Anthropic along with the context I said and hope that we get a good result.


text
1You are an expert in **prompt engineering** and **software engineering best practices**, and your task is to help me build a **master prompt for CLIne** that will enable it to "vibe code" an entire project efficiently, accurately, and maintainably.
2
3### Project Context
4- **spec.md**: This is the authoritative specification for the project. All decisions, features, and workflows should reference this file first.
5- **cline_master_prompt.md**: This is the current master prompt. It is functional but imperfect and must be improved.
6- **prompt_add.md**: This file contains supplementary information that must be integrated into the final master prompt wherever relevant.
7- **workflow.git**: This repository contains the workflow template to provide context on project structure, coding style, and operational logic. Reference it as needed for formatting, style, and organization guidance.
8
9### Objective
10Your deliverable is the **final, fully-formed CLIne master prompt** that:
111. Instructs CLIne on **all tasks, procedures, and coding decisions** necessary to build the project end-to-end.
122. Integrates relevant content from **spec.md**, **cline_master_prompt.md**, and **prompt_add.md**, without unnecessary duplication, preserving clarity and context.
133. Conforms to **software engineering best practices**, including but not limited to:
14 - **Modularity**: Break tasks and code generation into clear, manageable components.
15 - **Maintainability**: Ensure the prompt encourages clean, well-documented, and scalable code.
16 - **Consistency**: Follow consistent naming conventions, formatting, and architectural patterns.
17 - **Validation and Testing**: Include instructions for CLIne to generate appropriate unit tests, integration tests, or checks where applicable.
18 - **Documentation**: Produce internal documentation and inline comments for all generated code.
19 - **Error Handling**: Anticipate potential failures or edge cases and instruct CLIne to handle them gracefully.
20 - **Iterative Development**: Plan for incremental improvements, verification steps, and checkpoints to ensure correctness.
21 - **Version Control Awareness**: Encourage structured commit messages and proper branching strategies, if applicable.
22 - **Security and Performance Awareness**: Include guidance for secure coding practices and optimization considerations.
23
24### Instructions for Prompt Generation
251. **Read and Understand All Sources**: Thoroughly review **spec.md**, **cline_master_prompt.md**, and **prompt_add.md**. Do not assume anything; verify all information before including it.
262. **Reference, Don’t Overload**: Where possible, reference the files rather than quoting their full content to conserve context and readability.
273. **Structure the Prompt Logically**: Organize instructions in a step-by-step, hierarchical, or modular format that CLIne can follow reliably.
284. **Explicit Deliverables**: Specify clearly what CLIne should produce at each stage—files, code blocks, test scripts, configuration, documentation, or other artifacts.
295. **Iterative Verification**: Include checkpoints where CLIne should verify correctness against specifications or tests before proceeding to the next step.
306. **Comprehensiveness and Clarity**: Ensure the final prompt is detailed, unambiguous, and leaves no critical detail unaddressed.
31
32### Constraints
33- Do not invent features or requirements not present in **spec.md**.
34- Avoid vague instructions; everything should be actionable.
35- Ensure that the prompt can be fed to CLIne as-is to drive the coding workflow for the full project.
36
37Your output should be a **ready-to-use CLIne master prompt** that combines the authority of **spec.md**, the structure of **cline_master_prompt.md**, the supplementary details from **prompt_add.md**, and the workflow logic from **workflow.git**, while fully adhering to software engineering best practices.

OK, so here is what Anthropic output. I am going to save it to root as final_prompt.md. And I think we are just about ready to start vibe coding. This is going to be a massive prompt but thankfully the stealth model from CLIne that is free has a one million token context window. Yeah this is massive. I really should read over it first and edit it. You will kick yourself if you don't and just YOLO it only to find out a few hours later that you really should correct it from the beginning and start over.


text
1# CLIne Master Prompt: Mind Map AI - Local Knowledge Graph System
2
3## Meta-Instructions for CLIne
4
5You are CLIne, an AI coding assistant tasked with building **Mind Map AI**, a fully local, LLM-powered personal knowledge graph system. This prompt is your single source of truth for all development decisions, procedures, and coding standards.
6
7### Core Principles
81. **Specification Authority**: `spec.md` is the authoritative project specification. All features, architecture, and implementation decisions must align with it.
92. **Documentation-First**: Generate and maintain comprehensive documentation in `/docs/` before and during implementation.
103. **Incremental Development**: Complete each phase fully before proceeding to the next. Each phase has explicit deliverables and completion thresholds.
114. **Local-Only Constraint**: All LLM inference, databases, vector stores, and graph processing must operate locally. No external API calls unless explicitly configured by the user.
125. **Auditability**: Every extraction, transformation, and graph modification must preserve provenance and source text references.
136. **Best Practices**: Follow software engineering best practices for modularity, maintainability, testing, security, and documentation.
14
15---
16
17## Project Overview
18
19**Name**: Mind Map AI
20**Purpose**: Convert personal notes, journals, and markdown files into a browsable, queryable, and editable knowledge graph using local LLM inference.
21
22**Tech Stack**:
23- **Frontend**: Next.js (React) with `react-cytoscapejs` for graph visualization
24- **Backend**: FastAPI (Python) for REST API, graph management, and LLM integration
25- **Graph Engine**: NetworkX (in-memory graph, persisted to `.gpickle` or GraphML)
26- **Database**: SQLite for raw notes, metadata, and provenance tracking
27- **LLM**: Local model (Ollama, Llama.cpp, or similar)
28- **Embeddings**: Local sentence-transformers (e.g., all-MiniLM) or Ollama embedding endpoint
29- **Vector Store**: Lightweight local Chroma or Faiss for semantic search
30
31**Architecture**:
32
33[Next.js Frontend] <-> [FastAPI Backend] <-> [Local LLM Runtime]
34 ├─ SQLite (notes + extracts + metadata)
35 ├─ NetworkX Graph (.gpickle / GraphML)
36 └─ Vector DB (Chroma/Faiss embeddings)
37
38
39---
40
41## File Structure
42
43Maintain this exact directory structure:
44
45
46mindmap-ai/
47├── backend/
48│ ├── app/
49│ │ ├── main.py # FastAPI application entry point
50│ │ ├── config.py # Configuration (LLM endpoint, DB paths)
51│ │ ├── api/
52│ │ │ ├── __init__.py
53│ │ │ ├── ingest.py # Ingestion endpoints
54│ │ │ ├── graph.py # Graph query/mutation endpoints
55│ │ │ └── search.py # Semantic search endpoints
56│ │ ├── services/
57│ │ │ ├── __init__.py
58│ │ │ ├── extractor.py # LLM extraction logic
59│ │ │ ├── embeddings.py # Embedding generation
60│ │ │ └── graph_store.py # NetworkX wrapper + persistence
61│ │ └── db/
62│ │ ├── __init__.py
63│ │ ├── db.py # SQLite connection functions
64│ │ └── schema.sql # Database schema
65│ ├── requirements.txt
66│ └── Dockerfile
67├── frontend/
68│ ├── package.json
69│ ├── next.config.js
70│ ├── src/
71│ │ ├── pages/
72│ │ │ ├── index.js # Dashboard
73│ │ │ ├── graph.js # Graph visualization page
74│ │ │ ├── note/[id].js # Note detail page
75│ │ │ ├── search.js # Semantic search page
76│ │ │ └── settings.js # Configuration page
77│ │ └── components/
78│ │ ├── GraphCanvas.jsx # Cytoscape graph component
79│ │ ├── NodeDetailsPanel.jsx # Node provenance panel
80│ │ ├── NoteUploader.jsx # File upload component
81│ │ └── SearchBox.jsx # Search interface
82│ └── Dockerfile
83├── data/
84│ ├── notes/ # Sample markdown files
85│ ├── mindmap.db # SQLite database
86│ ├── graph.gpickle # Persisted NetworkX graph
87│ └── vectors/ # Vector DB files
88├── docs/
89│ ├── architecture.md
90│ ├── api-spec.md
91│ ├── database.md
92│ ├── llm_prompting.md
93│ ├── security.md
94│ ├── cicd_devops.md
95│ ├── testing.md
96│ ├── design_system.md
97│ ├── roadmap.md
98│ ├── decisions.md
99│ └── changelog.md
100├── tests/
101│ ├── backend/
102│ │ ├── test_db.py
103│ │ ├── test_extractor.py
104│ │ └── test_graph.py
105│ └── frontend/
106│ └── test_graph_ui.jsx
107├── checklist.md # Progress tracking
108├── README.md
109└── docker-compose.yml
110
111
112---
113
114## Development Workflow
115
116### Phase 0: Setup & Documentation
117
118**Objective**: Initialize project structure and generate comprehensive documentation templates.
119
120**Tasks**:
1211. Create all directories as specified in the file structure
1222. Initialize Git repository: `git init`
1233. Create `.gitignore` with entries for:
124 - `__pycache__/`, `*.pyc`, `.venv/`, `node_modules/`, `.env`, `*.db`, `*.gpickle`, `vectors/`
1254. Generate documentation templates in `/docs/`:
126 - `architecture.md`: System overview, technology choices, folder structure, architecture diagrams
127 - `api-spec.md`: REST endpoint specifications with request/response schemas
128 - `database.md`: SQLite schema, NetworkX graph model, persistence strategy
129 - `llm_prompting.md`: LLM roles, extraction prompt patterns, JSON schemas
130 - `security.md`: Authentication, API security, local privacy measures
131 - `cicd_devops.md`: Local dev setup, Docker configuration, environment dependencies
132 - `testing.md`: Unit, integration, and acceptance test strategies
133 - `design_system.md`: UI/UX patterns, visualization cues, interaction specifications
134 - `roadmap.md`: Future features and enhancements
135 - `decisions.md`: Architectural decision records (ADR format)
136 - `changelog.md`: Version history with dates and changes
1375. Create `checklist.md` with this phase as the first entry
1386. Create `README.md` with project overview, setup instructions, and quick start guide
139
140**Deliverables**:
141- Complete directory structure
142- All documentation templates with section headers and placeholders
143- Initialized Git repository with `.gitignore`
144- `checklist.md` with Phase 0 tasks listed
145
146**Completion Threshold**:
147- [ ] All directories exist
148- [ ] All `.md` files in `/docs/` contain structured placeholders
149- [ ] `README.md` contains project description and setup steps
150- [ ] Initial commit made to Git
151- [ ] Log creation in `decisions.md` with rationale for directory structure
152
153**Documentation Standards**:
154- Include code examples, diagrams (ASCII or markdown), and usage instructions
155- Use consistent markdown formatting (headers, lists, code blocks)
156- Reference other documentation files where appropriate using relative links
157
158---
159
160### Phase 1: Backend Core Infrastructure
161
162**Objective**: Set up FastAPI backend, SQLite database, and basic configuration.
163
164**Pre-requisites**: Phase 0 complete
165
166**Tasks**:
167
168#### 1.1 Database Setup
1691. Create `backend/app/db/schema.sql` with the following tables:
170
171```sql
172-- Table: notes
173-- Stores raw markdown/text content with metadata
174CREATE TABLE notes (
175 id INTEGER PRIMARY KEY AUTOINCREMENT,
176 filename TEXT NOT NULL,
177 content TEXT NOT NULL,
178 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
179 updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
180 source_path TEXT,
181 hash TEXT UNIQUE, -- Content hash for deduplication
182 processed BOOLEAN DEFAULT 0 -- Flag for extraction completion
183);
184
185-- Table: extracts
186-- Stores LLM extraction results with provenance
187CREATE TABLE extracts (
188 id INTEGER PRIMARY KEY AUTOINCREMENT,
189 note_id INTEGER NOT NULL REFERENCES notes(id) ON DELETE CASCADE,
190 extractor_model TEXT NOT NULL, -- Model identifier (e.g., "llama3-8b")
191 extract_json TEXT NOT NULL, -- Raw JSON output from LLM
192 score REAL, -- Confidence/quality score
193 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
194 FOREIGN KEY (note_id) REFERENCES notes(id)
195);
196
197-- Table: metadata
198-- Key-value store for system metadata
199CREATE TABLE metadata (
200 key TEXT PRIMARY KEY,
201 value TEXT,
202 updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
203);
204
205-- Indexes for performance
206CREATE INDEX idx_notes_hash ON notes(hash);
207CREATE INDEX idx_notes_processed ON notes(processed);
208CREATE INDEX idx_extracts_note_id ON extracts(note_id);
209
210
2112. Create `backend/app/db/db.py` with connection management:
212
213```python
214import sqlite3
215from pathlib import Path
216from typing import Optional, Dict, List, Any
217import hashlib
218import json
219
220DB_PATH = Path(__file__).parent.parent.parent.parent / "data" / "mindmap.db"
221
222def get_connection() -> sqlite3.Connection:
223 """Get SQLite connection with row factory."""
224 conn = sqlite3.connect(str(DB_PATH))
225 conn.row_factory = sqlite3.Row
226 return conn
227
228def init_database():
229 """Initialize database with schema."""
230 schema_path = Path(__file__).parent / "schema.sql"
231 with open(schema_path) as f:
232 schema = f.read()
233
234 conn = get_connection()
235 conn.executescript(schema)
236 conn.commit()
237 conn.close()
238
239def insert_note(filename: str, content: str, source_path: Optional[str] = None) -> int:
240 """Insert note and return note_id. Skip if hash exists."""
241 content_hash = hashlib.sha256(content.encode()).hexdigest()
242
243 conn = get_connection()
244 cursor = conn.cursor()
245
246 # Check if note with same hash exists
247 cursor.execute("SELECT id FROM notes WHERE hash = ?", (content_hash,))
248 existing = cursor.fetchone()
249
250 if existing:
251 conn.close()
252 return existing[0]
253
254 cursor.execute(
255 "INSERT INTO notes (filename, content, source_path, hash) VALUES (?, ?, ?, ?)",
256 (filename, content, source_path, content_hash)
257 )
258 note_id = cursor.lastrowid
259 conn.commit()
260 conn.close()
261
262 return note_id
263
264def insert_extract(note_id: int, extractor_model: str, extract_json: Dict, score: Optional[float] = None) -> int:
265 """Insert extraction result."""
266 conn = get_connection()
267 cursor = conn.cursor()
268
269 cursor.execute(
270 "INSERT INTO extracts (note_id, extractor_model, extract_json, score) VALUES (?, ?, ?, ?)",
271 (note_id, extractor_model, json.dumps(extract_json), score)
272 )
273 extract_id = cursor.lastrowid
274 conn.commit()
275 conn.close()
276
277 return extract_id
278
279def mark_note_processed(note_id: int):
280 """Mark note as processed after extraction."""
281 conn = get_connection()
282 cursor = conn.cursor()
283 cursor.execute("UPDATE notes SET processed = 1, updated_at = CURRENT_TIMESTAMP WHERE id = ?", (note_id,))
284 conn.commit()
285 conn.close()
286
287def get_note(note_id: int) -> Optional[Dict]:
288 """Retrieve note by ID."""
289 conn = get_connection()
290 cursor = conn.cursor()
291 cursor.execute("SELECT * FROM notes WHERE id = ?", (note_id,))
292 row = cursor.fetchone()
293 conn.close()
294
295 return dict(row) if row else None
296
297def get_all_notes() -> List[Dict]:
298 """Retrieve all notes."""
299 conn = get_connection()
300 cursor = conn.cursor()
301 cursor.execute("SELECT * FROM notes ORDER BY created_at DESC")
302 rows = cursor.fetchall()
303 conn.close()
304
305 return [dict(row) for row in rows]
306
307def get_extracts_for_note(note_id: int) -> List[Dict]:
308 """Retrieve all extracts for a given note."""
309 conn = get_connection()
310 cursor = conn.cursor()
311 cursor.execute("SELECT * FROM extracts WHERE note_id = ? ORDER BY created_at DESC", (note_id,))
312 rows = cursor.fetchall()
313 conn.close()
314
315 return [dict(row) for row in rows]
316
317
3183. Update `docs/database.md` with:
319 - Table schemas with column descriptions
320 - NetworkX graph model specification (see spec.md Section 5.1)
321 - Persistence strategy (gpickle vs GraphML tradeoffs)
322 - Provenance tracking approach
323
324#### 1.2 FastAPI Application Setup
325
3261. Create `backend/app/config.py`:
327
328
329from pydantic_settings import BaseSettings
330from pathlib import Path
331
332class Settings(BaseSettings):
333 # LLM Configuration
334 llm_endpoint: str = "http://localhost:11434/api/generate" # Default Ollama endpoint
335 llm_model: str = "llama3"
336 embedding_endpoint: str = "http://localhost:11434/api/embeddings"
337 embedding_model: str = "all-minilm"
338
339 # Database Paths
340 db_path: Path = Path(__file__).parent.parent.parent / "data" / "mindmap.db"
341 graph_path: Path = Path(__file__).parent.parent.parent / "data" / "graph.gpickle"
342 vector_db_path: Path = Path(__file__).parent.parent.parent / "data" / "vectors"
343
344 # API Configuration
345 api_host: str = "0.0.0.0"
346 api_port: int = 8000
347 cors_origins: list = ["http://localhost:3000"]
348
349 # Processing Configuration
350 max_batch_size: int = 10
351 extraction_timeout: int = 300 # seconds
352
353 class Config:
354 env_file = ".env"
355
356settings = Settings()
357
3582. Create `backend/app/main.py`:
359
360from fastapi import FastAPI
361from fastapi.middleware.cors import CORSMiddleware
362from .config import settings
363from .db.db import init_database
364from .api import ingest, graph, search
365
366app = FastAPI(
367 title="Mind Map AI",
368 description="Local LLM-powered personal knowledge graph",
369 version="0.1.0"
370)
371
372# CORS middleware
373app.add_middleware(
374 CORSMiddleware,
375 allow_origins=settings.cors_origins,
376 allow_credentials=True,
377 allow_methods=["*"],
378 allow_headers=["*"],
379)
380
381# Initialize database on startup
382@app.on_event("startup")
383async def startup_event():
384 init_database()
385 # Initialize graph store (will be implemented in Phase 2)
386 # from .services.graph_store import init_graph
387 # init_graph()
388
389# Include routers
390app.include_router(ingest.router, prefix="/api/ingest", tags=["ingestion"])
391app.include_router(graph.router, prefix="/api/graph", tags=["graph"])
392app.include_router(search.router, prefix="/api/search", tags=["search"])
393
394@app.get("/")
395async def root():
396 return {"message": "Mind Map AI API", "version": "0.1.0"}
397
398@app.get("/health")
399async def health_check():
400 return {"status": "healthy"}
401
4023. Create empty router files (to be implemented in later phases):
403 - `backend/app/api/__init__.py`
404 - `backend/app/api/ingest.py`
405 - `backend/app/api/graph.py`
406 - `backend/app/api/search.py`
407
4084. Create `backend/requirements.txt`:
409
410fastapi==0.109.0
411uvicorn[standard]==0.27.0
412pydantic-settings==2.1.0
413networkx==3.2.1
414requests==2.31.0
415sentence-transformers==2.3.1
416chromadb==0.4.22
417numpy==1.26.3
418python-multipart==0.0.6
419
420#### 1.3 Testing & Documentation
421
4221. Create `tests/backend/test_db.py`:
423
424import pytest
425from pathlib import Path
426import tempfile
427import shutil
428from backend.app.db import db
429
430@pytest.fixture
431def temp_db():
432 """Create temporary database for testing."""
433 temp_dir = tempfile.mkdtemp()
434 original_db_path = db.DB_PATH
435 db.DB_PATH = Path(temp_dir) / "test.db"
436 db.init_database()
437
438 yield db.DB_PATH
439
440 # Cleanup
441 shutil.rmtree(temp_dir)
442 db.DB_PATH = original_db_path
443
444def test_insert_note(temp_db):
445 """Test note insertion."""
446 note_id = db.insert_note("test.md", "Test content", "/path/to/test.md")
447 assert note_id > 0
448
449 note = db.get_note(note_id)
450 assert note['filename'] == "test.md"
451 assert note['content'] == "Test content"
452 assert note['processed'] == 0
453
454def test_duplicate_note_hash(temp_db):
455 """Test that duplicate content returns existing note_id."""
456 note_id_1 = db.insert_note("test1.md", "Same content")
457 note_id_2 = db.insert_note("test2.md", "Same content")
458
459 assert note_id_1 == note_id_2
460
461def test_insert_extract(temp_db):
462 """Test extract insertion."""
463 note_id = db.insert_note("test.md", "Test content")
464 extract_json = {"nodes": [], "edges": []}
465 extract_id = db.insert_extract(note_id, "llama3", extract_json, 0.95)
466
467 assert extract_id > 0
468
469 extracts = db.get_extracts_for_note(note_id)
470 assert len(extracts) == 1
471 assert extracts[0]['extractor_model'] == "llama3"
472
473def test_mark_note_processed(temp_db):
474 """Test marking note as processed."""
475 note_id = db.insert_note("test.md", "Test content")
476 db.mark_note_processed(note_id)
477
478 note = db.get_note(note_id)
479 assert note['processed'] == 1
480
4812. Update `docs/architecture.md` with:
482 - Technology stack rationale
483 - Backend architecture diagram (ASCII art or description)
484 - Data flow from ingestion to graph
485 - Module dependencies
486
4873. Update `docs/cicd_devops.md` with:
488 - Python environment setup (`venv`, dependencies)
489 - Running the backend: `uvicorn app.main:app --reload`
490 - Database initialization steps
491
492**Deliverables**:
493- `backend/app/db/schema.sql` with complete schema
494- `backend/app/db/db.py` with all CRUD functions
495- `backend/app/config.py` with settings management
496- `backend/app/main.py` with FastAPI app initialization
497- `backend/requirements.txt` with all dependencies
498- `tests/backend/test_db.py` with passing unit tests
499- Updated documentation in `docs/`
500
501**Completion Threshold**:
502- [ ] SQLite database can be created and queried
503- [ ] FastAPI server runs locally without errors: `uvicorn app.main:app --reload`
504- [ ] All database unit tests pass: `pytest tests/backend/test_db.py`
505- [ ] `/health` endpoint returns 200 OK
506- [ ] Update `checklist.md` with Phase 1 completion
507- [ ] Log backend setup in `decisions.md`
508
509---
510
511### Phase 2: NetworkX Graph Store
512
513**Objective**: Implement in-memory graph using NetworkX with disk persistence.
514
515**Pre-requisites**: Phase 1 complete
516
517**Tasks**:
518
519#### 2.1 Graph Store Implementation
520
5211. Create `backend/app/services/graph_store.py`:
522
523import networkx as nx
524from pathlib import Path
525from typing import Dict, List, Tuple, Optional, Any
526import pickle
527from datetime import datetime
528from ..config import settings
529
530class GraphStore:
531 """Manages NetworkX graph with disk persistence."""
532
533 def __init__(self, graph_path: Optional[Path] = None):
534 self.graph_path = graph_path or settings.graph_path
535 self.graph = self._load_graph()
536
537 def _load_graph(self) -> nx.Graph:
538 """Load graph from disk or create new."""
539 if self.graph_path.exists():
540 try:
541 with open(self.graph_path, 'rb') as f:
542 return pickle.load(f)
543 except Exception as e:
544 print(f"Error loading graph: {e}. Creating new graph.")
545 return nx.Graph()
546 else:
547 return nx.Graph()
548
549 def save(self):
550 """Persist graph to disk."""
551 self.graph_path.parent.mkdir(parents=True, exist_ok=True)
552 with open(self.graph_path, 'wb') as f:
553 pickle.dump(self.graph, f)
554
555 def add_node(
556 self,
557 node_id: str,
558 label: str,
559 node_type: str,
560 provenance: List[Tuple[int, int, int]] = None,
561 **kwargs
562 ) -> str:
563 """
564 Add or update node in graph.
565
566 Args:
567 node_id: Unique node identifier
568 label: Display name
569 node_type: Type (concept, person, place, idea, event, passage)
570 provenance: List of (note_id, span_start, span_end) tuples
571 **kwargs: Additional attributes (embedding, metadata, etc.)
572
573 Returns:
574 node_id
575 """
576 if self.graph.has_node(node_id):
577 # Update existing node
578 existing = self.graph.nodes[node_id]
579 existing['label'] = label
580 existing['type'] = node_type
581
582 # Merge provenance
583 existing_prov = existing.get('provenance', [])
584 new_prov = provenance or []
585 existing['provenance'] = existing_prov + [p for p in new_prov if p not in existing_prov]
586
587 existing['updated_at'] = datetime.now().isoformat()
588 existing.update(kwargs)
589 else:
590 # Add new node
591 self.graph.add_node(
592 node_id,
593 label=label,
594 type=node_type,
595 provenance=provenance or [],
596 created_at=datetime.now().isoformat(),
597 updated_at=datetime.now().isoformat(),
598 **kwargs
599 )
600
601 return node_id
602
603 def add_edge(
604 self,
605 source: str,
606 target: str,
607 edge_type: str,
608 weight: float = 1.0,
609 extraction_id: Optional[int] = None,
610 provenance: Optional[List[Tuple[int, int, int]]] = None,
611 **kwargs
612 ):
613 """
614 Add or update edge in graph.
615
616 Args:
617 source: Source node ID
618 target: Target node ID
619 edge_type: Relationship type (related_to, causes, elaborates, etc.)
620 weight: Confidence score (0-1)
621 extraction_id: Reference to extracts table
622 provenance: Source spans
623 **kwargs: Additional attributes
624 """
625 if not self.graph.has_node(source) or not self.graph.has_node(target):
626 raise ValueError(f"Both nodes must exist before adding edge: {source} -> {target}")
627
628 if self.graph.has_edge(source, target):
629 # Update existing edge
630 existing = self.graph.edges[source, target]
631 existing['type'] = edge_type
632 existing['weight'] = weight
633 existing['extraction_id'] = extraction_id
634 existing['provenance'] = provenance or []
635 existing['updated_at'] = datetime.now().isoformat()
636 existing.update(kwargs)
637 else:
638 # Add new edge
639 self.graph.add_edge(
640 source,
641 target,
642 type=edge_type,
643 weight=weight,
644 extraction_id=extraction_id,
645 provenance=provenance or [],
646 created_at=datetime.now().isoformat(),
647 updated_at=datetime.now().isoformat(),
648 **kwargs
649 )
650
651 def get_node(self, node_id: str) -> Optional[Dict]:
652 """Get node attributes."""
653 if self.graph.has_node(node_id):
654 data = dict(self.graph.nodes[node_id])
655 data['id'] = node_id
656 return data
657 return None
658
659 def get_all_nodes(self) -> List[Dict]:
660 """Get all nodes with attributes."""
661 return [
662 {'id': node_id, **dict(attrs)}
663 for node_id, attrs in self.graph.nodes(data=True)
664 ]
665
666 def get_edges(self, node_id: Optional[str] = None) -> List[Dict]:
667 """Get edges, optionally filtered by node."""
668 if node_id:
669 edges = self.graph.edges(node_id, data=True)
670 else:
671 edges = self.graph.edges(data=True)
672
673 return [
674 {'source': u, 'target': v, **attrs}
675 for u, v, attrs in edges
676 ]
677
678 def delete_node(self, node_id: str):
679 """Remove node and associated edges."""
680 if self.graph.has_node(node_id):
681 self.graph.remove_node(node_id)
682
683 def delete_edge(self, source: str, target: str):
684 """Remove edge."""
685 if self.graph.has_edge(source, target):
686 self.graph.remove_edge(source, target)
687
688 def get_neighbors(self, node_id: str, depth: int = 1) -> List[str]:
689 """Get neighboring nodes up to specified depth."""
690 if not self.graph.has_node(node_id):
691 return []
692
693 neighbors = set()
694 current_level = {node_id}
695
696 for _ in range(depth):
697 next_level = set()
698 for node in current_level:
699 next_level.update(self.graph.neighbors(node))
700 neighbors.update(next_level)
701 current_level = next_level
702
703 return list(neighbors)
704
705 def get_subgraph(self, node_id: str, depth: int = 2) -> Dict:
706 """Get subgraph around node for visualization."""
707 neighbors = self.get_neighbors(node_id, depth)
708 nodes_to_include = [node_id] + neighbors
709
710 subgraph = self.graph.subgraph(nodes_to_include)
711
712 return {
713 'nodes': [
714 {'id': n, **dict(attrs)}
715 for n, attrs in subgraph.nodes(data=True)
716 ],
717 'edges': [
718 {'source': u, 'target': v, **attrs}
719 for u, v, attrs in subgraph.edges(data=True)
720 ]
721 }
722
723 def compute_centrality(self, metric: str = 'degree') -> Dict[str, float]:
724 """Compute centrality metrics for visualization."""
725 if metric == 'degree':
726 return nx.degree_centrality(self.graph)
727 elif metric == 'eigenvector':
728 try:
729 return nx.eigenvector_centrality(self.graph, max_iter=1000)
730 except:
731 return nx.degree_centrality(self.graph) # Fallback
732 elif metric == 'betweenness':
733 return nx.betweenness_centrality(self.graph)
734 else:
735 return nx.degree_centrality(self.graph)
736
737 def export_graphml(self, output_path: Path):
738 """Export graph to GraphML format."""
739 nx.write_graphml(self.graph, str(output_path))
740
741 def export_gexf(self, output_path: Path):
742 """Export graph to GEXF format."""
743 nx.write_gexf(self.graph, str(output_path))
744
745 def get_stats(self) -> Dict:
746 """Get graph statistics."""
747 return {
748 'num_nodes': self.graph.number_of_nodes(),
749 'num_edges': self.graph.number_of_edges(),
750 'density': nx.density(self.graph),
751 'connected_components': nx.number_connected_components(self.graph),
752 }
753
754
755# Global instance
756_graph_store = None
757
758def get_graph_store() -> GraphStore:
759 """Get or create global graph store instance."""
760 global _graph_store
761 if _graph_store is None:
762 _graph_store = GraphStore()
763 return _graph_store
764
765def init_graph():
766 """Initialize graph store on startup."""
767 global _graph_store
768 _graph_store = GraphStore()
769
7702. Uncomment graph initialization in `backend/app/main.py` startup event:
771
772@app.on_event("startup")
773async def startup_event():
774 init_database()
775 from .services.graph_store import init_graph
776 init_graph()
777
778#### 2.2 Basic Graph API Endpoints
779
7801. Implement `backend/app/api/graph.py`:
781
782from fastapi import APIRouter, HTTPException, Query
783from typing import Optional, List
784from pydantic import BaseModel
785from ..services.graph_store import get_graph_store
786from pathlib import Path
787
788router = APIRouter()
789
790class NodeCreate(BaseModel):
791 id: str
792 label: str
793 type: str
794 provenance: List[List[int]] = []
795 metadata: dict = {}
796
797class EdgeCreate(BaseModel):
798 source: str
799 target: str
800 type: str
801 weight: float = 1.0
802 extraction_id: Optional[int] = None
803
804@router.get("/")
805async def get_graph(
806 node_id: Optional[str] = Query(None, description="Get subgraph around node"),
807 depth: int = Query(2, description="Subgraph depth")
808):
809 """Get full graph or subgraph around a node."""
810 graph_store = get_graph_store()
811
812 if node_id:
813 return graph_store.get_subgraph(node_id, depth)
814 else:
815 return {
816 'nodes': graph_store.get_all_nodes(),
817 'edges': graph_store.get_edges()
818 }
819
820@router.get("/node/{node_id}")
821async def get_node(node_id: str):
822 """Get specific node details."""
823 graph_store = get_graph_store()
824 node = graph_store.get_node(node_id)
825
826 if not node:
827 raise HTTPException(status_code=404, detail="Node not found")
828
829 return node
830
831@router.post("/node")
832async def create_node(node: NodeCreate):
833 """Create or update node."""
834 graph_store = get_graph_store()
835
836 node_id = graph_store.add_node(
837 node.id,
838 node.label,
839 node.type,
840 provenance=[tuple(p) for p in node.provenance],
841 **node.metadata
842 )
843
844 graph_store.save()
845
846 return {"node_id": node_id}
847
848@router.post("/edge")
849async def create_edge(edge: EdgeCreate):
850 """Create or update edge."""
851 graph_store = get_graph_store()
852
853 try:
854 graph_store.add_edge(
855 edge.source,
856 edge.target,
857 edge.type,
858 weight=edge.weight,
859 extraction_id=edge.extraction_id
860 )
861 graph_store.save()
862 return {"status": "success"}
863 except ValueError as e:
864 raise HTTPException(status_code=400, detail=str(e))
865
866@router.delete("/node/{node_id}")
867async def delete_node(node_id: str):
868 """Delete node and associated edges."""
869 graph_store = get_graph_store()
870 graph_store.delete_node(node_id)
871 graph_store.save()
872 return {"status": "deleted"}
873
874@router.delete("/edge")
875async def delete_edge(source: str, target: str):
876 """Delete edge."""
877 graph_store = get_graph_store()
878 graph_store.delete_edge(source, target)
879 graph_store.save()
880 return {"status": "deleted"}
881
882@router.get("/stats")
883async def get_stats():
884 """Get graph statistics."""
885 graph_store = get_graph_store()
886 return graph_store.get_stats()
887
888@router.get("/export")
889async def export_graph(format: str = Query("graphml", enum=["graphml", "gexf", "gpickle"])):
890 """Export graph in specified format."""
891 from fastapi.responses import FileResponse
892 import tempfile
893
894 graph_store = get_graph_store()
895
896 with tempfile.NamedTemporaryFile(delete=False, suffix=f".{format}") as tmp:
897 tmp_path = Path(tmp.name)
898
899 if format == "graphml":
900 graph_store.export_graphml(tmp_path)
901 elif format == "gexf":
902 graph_store.export_gexf(tmp_path)
903 elif format == "gpickle":
904 import shutil
905 shutil.copy(graph_store.graph_path, tmp_path)
906
907 return FileResponse(
908 tmp_path,
909 media_type="application/octet-stream",
910 filename=f"mindmap_graph.{format}"
911 )
912
913#### 2.3 Testing & Documentation
914
9151. Create `tests/backend/test_graph.py`:
916
917import pytest
918from backend.app.services.graph_store import GraphStore
919from pathlib import Path
920import tempfile
921
922@pytest.fixture
923def temp_graph():
924 """Create temporary graph for testing."""
925 with tempfile.NamedTemporaryFile(suffix=".gpickle", delete=False) as tmp:
926 tmp_path = Path(tmp.name)
927
928 graph_store = GraphStore(tmp_path)
929
930 yield graph_store
931
932 # Cleanup
933 if tmp_path.exists():
934 tmp_path.unlink()
935
936def test_add_node(temp_graph):
937 """Test node addition."""
938 node_id = temp_graph.add_node(
939 "node:1",
940 "Test Node",
941 "concept",
942 provenance=[(1, 0, 10)]
943 )
944
945 assert node_id == "node:1"
946 assert temp_graph.graph.has_node("node:1")
947
948 node = temp_graph.get_node("node:1")
949 assert node['label'] == "Test Node"
950 assert node['type'] == "concept"
951 assert len(node['provenance']) == 1
952
953def test_add_edge(temp_graph):
954 """Test edge addition."""
955 temp_graph.add_node("node:1", "Node 1", "concept")
956 temp_graph.add_node("node:2", "Node 2", "concept")
957
958 temp_graph.add_edge("node:1", "node:2", "related_to", weight=0.9)
959
960 assert temp_graph.graph.has_edge("node:1", "node:2")
961
962 edges = temp_graph.get_edges("node:1")
963 assert len(edges) == 1
964 assert edges[0]['type'] == "related_to"
965 assert edges[0]['weight'] == 0.9
966
967def test_persistence(temp_graph):
968 """Test graph save and load."""
969 temp_graph.add_node("node:1", "Test Node", "concept")
970 temp_graph.add_node("node:2", "Test Node 2", "person")
971 temp_graph.add_edge("node:1", "node:2", "related_to")
972
973 temp_graph.save()
974
975 # Create new instance with same path
976 new_graph = GraphStore(temp_graph.graph_path)
977
978 assert new_graph.graph.has_node("node:1")
979 assert new_graph.graph.has_node("node:2")
980 assert new_graph.graph.has_edge("node:1", "node:2")
981
982def test_merge_provenance(temp_graph):
983 """Test provenance merging on node update."""
984 temp_graph.add_node("node:1", "Test", "concept", provenance=[(1, 0, 10)])
985 temp_graph.add_node("node:1", "Test", "concept", provenance=[(2, 5, 15)])
986
987 node = temp_graph.get_node("node:1")
988 assert len(node['provenance']) == 2
989 assert (1, 0, 10) in node['provenance']
990 assert (2, 5, 15) in node['provenance']
991
992def test_get_neighbors(temp_graph):
993 """Test neighbor retrieval."""
994 temp_graph.add_node("node:1", "Node 1", "concept")
995 temp_graph.add_node("node:2", "Node 2", "concept")
996 temp_graph.add_node("node:3", "Node 3", "concept")
997
998 temp_graph.add_edge("node:1", "node:2", "related_to")
999 temp_graph.add_edge("node:2", "node:3", "related_to")
1000
1001 neighbors_d1 = temp_graph.get_neighbors("node:1", depth=1)
1002 assert "node:2" in neighbors_d1
1003 assert "node:3" not in neighbors_d1
1004
1005 neighbors_d2 = temp_graph.get_neighbors("node:1", depth=2)
1006 assert "node:2" in neighbors_d2
1007 assert "node:3" in neighbors_d2
1008
1009def test_subgraph(temp_graph):
1010 """Test subgraph extraction."""
1011 temp_graph.add_node("node:1", "Node 1", "concept")
1012 temp_graph.add_node("node:2", "Node 2", "concept")
1013 temp_graph.add_node("node:3", "Node 3", "concept")
1014 temp_graph.add_node("node:4", "Node 4", "concept")
1015
1016 temp_graph.add_edge("node:1", "node:2", "related_to")
1017 temp_graph.add_edge("node:2", "node:3", "related_to")
1018 temp_graph.add_edge("node:3", "node:4", "related_to")
1019
1020 subgraph = temp_graph.get_subgraph("node:2", depth=1)
1021
1022 node_ids = [n['id'] for n in subgraph['nodes']]
1023 assert "node:2" in node_ids
1024 assert "node:1" in node_ids
1025 assert "node:3" in node_ids
1026 assert "node:4" not in node_ids
1027
1028def test_centrality(temp_graph):
1029 """Test centrality computation."""
1030 temp_graph.add_node("node:1", "Node 1", "concept")
1031 temp_graph.add_node("node:2", "Node 2", "concept")
1032 temp_graph.add_node("node:3", "Node 3", "concept")
1033
1034 temp_graph.add_edge("node:1", "node:2", "related_to")
1035 temp_graph.add_edge("node:1", "node:3", "related_to")
1036 temp_graph.add_edge("node:2", "node:3", "related_to")
1037
1038 centrality = temp_graph.compute_centrality("degree")
1039
1040 assert "node:1" in centrality
1041 assert "node:2" in centrality
1042 assert "node:3" in centrality
1043 assert centrality["node:1"] > 0
1044
10452. Update `docs/database.md` with:
1046 - NetworkX graph model (node/edge attributes)
1047 - Provenance tracking mechanism
1048 - Persistence strategy (gpickle advantages)
1049 - Graph merging and deduplication logic
1050
10513. Update `docs/api-spec.md` with:
1052 - All graph endpoints with request/response examples
1053 - Error codes and handling
1054 - Pagination considerations for large graphs
1055
1056**Deliverables**:
1057- `backend/app/services/graph_store.py` with full GraphStore class
1058- `backend/app/api/graph.py` with all CRUD endpoints
1059- `tests/backend/test_graph.py` with comprehensive tests
1060- Updated documentation
1061
1062**Completion Threshold**:
1063- [ ] Graph can be saved and reloaded from disk
1064- [ ] All graph tests pass: `pytest tests/backend/test_graph.py`
1065- [ ] Graph API endpoints accessible via FastAPI
1066- [ ] `GET /api/graph` returns empty graph structure
1067- [ ] `GET /api/graph/stats` returns node/edge counts
1068- [ ] Update `checklist.md` with Phase 2 completion
1069- [ ] Log graph design decisions in `decisions.md`
1070
1071---
1072
1073### Phase 3: LLM Extraction Module
1074
1075**Objective**: Implement local LLM integration for extracting entities, concepts, and relationships from text.
1076
1077**Pre-requisites**: Phases 1 and 2 complete
1078
1079**Tasks**:
1080
1081#### 3.1 LLM Extraction Prompt Design
1082
10831. Update `docs/llm_prompting.md` with the extraction prompt schema:
1084
1085# LLM Prompting Strategy
1086
1087## Extraction Prompt Pattern
1088
1089### System Instructions
1090You are a knowledge extraction assistant. Your task is to analyze text and extract structured information in strict JSON format.
1091
1092### Required JSON Schema
1093{
1094 "nodes": [
1095 {
1096 "label": string, // Entity or concept name
1097 "type": string, // One of: concept, person, place, idea, event, passage
1098 "span": [int, int], // Character position [start, end] in source text
1099 "confidence": float // Score between 0 and 1
1100 }
1101 ],
1102 "edges": [
1103 {
1104 "source": string, // Label of source node
1105 "target": string, // Label of target node
1106 "type": string, // Relationship type (see below)
1107 "confidence": float // Score between 0 and 1
1108 }
1109 ],
1110 "summary": string // One-sentence summary of passage
1111}
1112
1113### Edge Types
1114- **related_to**: General association
1115- **causes**: Causal relationship
1116- **elaborates**: Provides detail or explanation
1117- **contradicts**: Conflicting information
1118- **similar_to**: Conceptual similarity
1119- **part_of**: Hierarchical relationship
1120- **precedes**: Temporal ordering
1121- **affects**: Impact or influence
1122
1123### Example 1
1124
1125**Input:**
1126I haven't been sleeping well, which makes my work energy low and irritability higher. I want to improve exercise and sleep routine.
1127
1128**Output:**
1129{
1130 "nodes": [
1131 {"label": "sleep quality", "type": "concept", "span": [11, 24], "confidence": 0.95},
1132 {"label": "work energy", "type": "concept", "span": [39, 50], "confidence": 0.9},
1133 {"label": "irritability", "type": "concept", "span": [59, 71], "confidence": 0.9},
1134 {"label": "exercise", "type": "activity", "span": [99, 107], "confidence": 0.85},
1135 {"label": "sleep routine", "type": "activity", "span": [112, 125], "confidence": 0.85}
1136 ],
1137 "edges": [
1138 {"source": "sleep quality", "target": "work energy", "type": "affects", "confidence": 0.95},
1139 {"source": "sleep quality", "target": "irritability", "type": "affects", "confidence": 0.9},
1140 {"source": "exercise", "target": "sleep routine", "type": "related_to", "confidence": 0.8}
1141 ],
1142 "summary": "Poor sleep negatively impacts work performance and mood, prompting desire to improve health routines."
1143}
1144
1145### Example 2
1146
1147**Input:**
1148Artificial intelligence and machine learning are transforming software development. AI can assist with code generation, bug detection, and optimization.
1149
1150**Output:**
1151{
1152 "nodes": [
1153 {"label": "artificial intelligence", "type": "concept", "span": [0, 24], "confidence": 0.98},
1154 {"label": "machine learning", "type": "concept", "span": [29, 45], "confidence": 0.98},
1155 {"label": "software development", "type": "concept", "span": [64, 84], "confidence": 0.95},
1156 {"label": "code generation", "type": "activity", "span": [106, 121], "confidence": 0.9},
1157 {"label": "bug detection", "type": "activity", "span": [123, 136], "confidence": 0.9},
1158 {"label": "optimization", "type": "activity", "span": [142, 154], "confidence": 0.85}
1159 ],
1160 "edges": [
1161 {"source": "artificial intelligence", "target": "machine learning", "type": "related_to", "confidence": 0.95},
1162 {"source": "artificial intelligence", "target": "software development", "type": "affects", "confidence": 0.9},
1163 {"source": "artificial intelligence", "target": "code generation", "type": "enables", "confidence": 0.88},
1164 {"source": "artificial intelligence", "target": "bug detection", "type": "enables", "confidence": 0.88},
1165 {"source": "artificial intelligence", "target": "optimization", "type": "enables", "confidence": 0.85}
1166 ],
1167 "summary": "AI and ML technologies are revolutionizing how software is developed through automated assistance."
1168}
1169
1170## Normalization Prompt Pattern
1171
1172### Task
1173Given multiple entity mentions, identify the canonical (preferred) form and list all aliases.
1174
1175### Input Format
1176{
1177 "entities": ["AI", "artificial intelligence", "A.I.", "machine intelligence"]
1178}
1179
1180### Output Format
1181{
1182 "canonical": "artificial intelligence",
1183 "aliases": ["AI", "A.I.", "machine intelligence"],
1184 "rationale": "Full expanded form is most descriptive and unambiguous"
1185}
1186
1187## Implementation Notes
1188- Always validate JSON output before processing
1189- Handle extraction failures gracefully with empty nodes/edges arrays
1190- Store raw LLM output for debugging and refinement
1191- Implement timeout handling (max 300 seconds per extraction)
1192
1193#### 3.2 Extractor Service Implementation
1194
11951. Create `backend/app/services/extractor.py`:
1196
1197import requests
1198import json
1199from typing import Dict, List, Tuple, Optional
1200from ..config import settings
1201from ..db.db import insert_extract, mark_note_processed, get_note
1202from .graph_store import get_graph_store
1203import hashlib
1204import re
1205
1206EXTRACTION_PROMPT_TEMPLATE = """You are a knowledge extraction assistant. Analyze the following text and extract structured information in strict JSON format.
1207
1208Required JSON Schema:
1209{{
1210 "nodes": [
1211 {{"label": "string", "type": "concept|person|place|idea|event|passage", "span": [start, end], "confidence": 0.0-1.0}}
1212 ],
1213 "edges": [
1214 {{"source": "label", "target": "label", "type": "related_to|causes|elaborates|contradicts|similar_to|part_of|precedes|affects", "confidence": 0.0-1.0}}
1215 ],
1216 "summary": "one-sentence summary"
1217}}
1218
1219Edge types:
1220- related_to: General association
1221- causes: Causal relationship
1222- elaborates: Provides detail
1223- contradicts: Conflicting information
1224- similar_to: Conceptual similarity
1225- part_of: Hierarchical relationship
1226- precedes: Temporal ordering
1227- affects: Impact or influence
1228
1229Return ONLY valid JSON. No additional text.
1230
1231Text to analyze:
1232\"\"\"
1233{text}
1234\"\"\"
1235"""
1236
1237def normalize_label(label: str) -> str:
1238 """Normalize entity label for consistent node IDs."""
1239 # Lowercase, remove special chars, replace spaces with underscores
1240 normalized = re.sub(r'[^\w\s-]', '', label.lower())
1241 normalized = re.sub(r'\s+', '_', normalized)
1242 return normalized.strip('_')
1243
1244def generate_node_id(label: str) -> str:
1245 """Generate unique node ID from label."""
1246 normalized = normalize_label(label)
1247 # Use hash for uniqueness while keeping it deterministic
1248 hash_suffix = hashlib.md5(normalized.encode()).hexdigest()[:8]
1249 return f"node:{normalized}_{hash_suffix}"
1250
1251def call_local_llm(prompt: str, model: str = None) -> str:
1252 """
1253 Call local LLM endpoint (Ollama format).
1254
1255 Args:
1256 prompt: The prompt text
1257 model: Model name (defaults to settings.llm_model)
1258
1259 Returns:
1260 Generated text response
1261
1262 Raises:
1263 Exception: If LLM call fails
1264 """
1265 model = model or settings.llm_model
1266
1267 try:
1268 response = requests.post(
1269 settings.llm_endpoint,
1270 json={
1271 "model": model,
1272 "prompt": prompt,
1273 "stream": False,
1274 "options": {
1275 "temperature": 0.3, # Lower temperature for more consistent extraction
1276 "num_predict": 2048
1277 }
1278 },
1279 timeout=settings.extraction_timeout
1280 )
1281 response.raise_for_status()
1282
1283 result = response.json()
1284 return result.get("response", "")
1285
1286 except requests.exceptions.Timeout:
1287 raise Exception("LLM request timed out")
1288 except requests.exceptions.RequestException as e:
1289 raise Exception(f"LLM request failed: {str(e)}")
1290
1291def parse_extraction_output(llm_output: str) -> Dict:
1292 """
1293 Parse and validate LLM extraction output.
1294
1295 Args:
1296 llm_output: Raw LLM response string
1297
1298 Returns:
1299 Parsed and validated extraction dict
1300
1301 Raises:
1302 ValueError: If output is invalid JSON or missing required fields
1303 """
1304 # Try to extract JSON from output (handle cases where LLM adds extra text)
1305 json_match = re.search(r'\{.*\}', llm_output, re.DOTALL)
1306 if not json_match:
1307 raise ValueError("No JSON found in LLM output")
1308
1309 try:
1310 data = json.loads(json_match.group(0))
1311 except json.JSONDecodeError as e:
1312 raise ValueError(f"Invalid JSON: {str(e)}")
1313
1314 # Validate schema
1315 if "nodes" not in data or not isinstance(data["nodes"], list):
1316 raise ValueError("Missing or invalid 'nodes' field")
1317
1318 if "edges" not in data or not isinstance(data["edges"], list):
1319 raise ValueError("Missing or invalid 'edges' field")
1320
1321 if "summary" not in data:
1322 data["summary"] = "" # Optional field
1323
1324 # Validate node structure
1325 valid_node_types = {"concept", "person", "place", "idea", "event", "passage"}
1326 for node in data["nodes"]:
1327 if not all(k in node for k in ["label", "type", "span", "confidence"]):
1328 raise ValueError(f"Invalid node structure: {node}")
1329
1330 if node["type"] not in valid_node_types:
1331 raise ValueError(f"Invalid node type: {node['type']}")
1332
1333 if not isinstance(node["span"], list) or len(node["span"]) != 2:
1334 raise ValueError(f"Invalid span format: {node['span']}")
1335
1336 if not 0 <= node["confidence"] <= 1:
1337 raise ValueError(f"Invalid confidence score: {node['confidence']}")
1338
1339 # Validate edge structure
1340 valid_edge_types = {
1341 "related_to", "causes", "elaborates", "contradicts",
1342 "similar_to", "part_of", "precedes", "affects"
1343 }
1344 for edge in data["edges"]:
1345 if not all(k in edge for k in ["source", "target", "type", "confidence"]):
1346 raise ValueError(f"Invalid edge structure: {edge}")
1347
1348 if edge["type"] not in valid_edge_types:
1349 raise ValueError(f"Invalid edge type: {edge['type']}")
1350
1351 if not 0 <= edge["confidence"] <= 1:
1352 raise ValueError(f"Invalid confidence score: {edge['confidence']}")
1353
1354 return data
1355
1356def extract_from_text(text: str, note_id: int) -> Dict:
1357 """
1358 Extract entities and relationships from text using local LLM.
1359
1360 Args:
1361 text: Input text to analyze
1362 note_id: Associated note ID for provenance
1363
1364 Returns:
1365 Extraction result with nodes and edges
1366 """
1367 prompt = EXTRACTION_PROMPT_TEMPLATE.format(text=text)
1368
1369 # Call LLM
1370 llm_output = call_local_llm(prompt)
1371
1372 # Parse and validate
1373 extraction = parse_extraction_output(llm_output)
1374
1375 # Add note_id to provenance
1376 for node in extraction["nodes"]:
1377 node["note_id"] = note_id
1378
1379 return extraction
1380
1381def update_graph_from_extraction(extraction: Dict, note_id: int, extraction_id: int):
1382 """
1383 Update NetworkX graph with extraction results.
1384
1385 Args:
1386 extraction: Parsed extraction dict
1387 note_id: Source note ID
1388 extraction_id: Extract record ID
1389 """
1390 graph_store = get_graph_store()
1391
1392 # Track created node IDs for edge creation
1393 node_label_to_id = {}
1394
1395 # Add/update nodes
1396 for node_data in extraction["nodes"]:
1397 label = node_data["label"]
1398 node_id = generate_node_id(label)
1399
1400 span_start, span_end = node_data["span"]
1401 provenance = [(note_id, span_start, span_end)]
1402
1403 graph_store.add_node(
1404 node_id,
1405 label,
1406 node_data["type"],
1407 provenance=provenance,
1408 confidence=node_data["confidence"]
1409 )
1410
1411 node_label_to_id[label] = node_id
1412
1413 # Add edges
1414 for edge_data in extraction["edges"]:
1415 source_label = edge_data["source"]
1416 target_label = edge_data["target"]
1417
1418 # Get node IDs (may need to generate if referenced node doesn't exist in this extraction)
1419 source_id = node_label_to_id.get(source_label, generate_node_id(source_label))
1420 target_id = node_label_to_id.get(target_label, generate_node_id(target_label))
1421
1422 # Skip edge if either node doesn't exist in graph
1423 if not graph_store.graph.has_node(source_id) or not graph_store.graph.has_node(target_id):
1424 continue
1425
1426 graph_store.add_edge(
1427 source_id,
1428 target_id,
1429 edge_data["type"],
1430 weight=edge_data["confidence"],
1431 extraction_id=extraction_id
1432 )
1433
1434 # Save graph
1435 graph_store.save()
1436
1437def process_note(note_id: int) -> Dict:
1438 """
1439 Full extraction pipeline for a note.
1440
1441 Args:
1442 note_id: Note to process
1443
1444 Returns:
1445 Processing result with stats
1446 """
1447 # Get note content
1448 note = get_note(note_id)
1449 if not note:
1450 raise ValueError(f"Note {note_id} not found")
1451
1452 if note['processed']:
1453 return {"status": "already_processed", "note_id": note_id}
1454
1455 content = note['content']
1456
1457 # Extract
1458 try:
1459 extraction = extract_from_text(content, note_id)
1460 except Exception as e:
1461 return {
1462 "status": "extraction_failed",
1463 "note_id": note_id,
1464 "error": str(e)
1465 }
1466
1467 # Store extract
1468 extraction_id = insert_extract(
1469 note_id,
1470 settings.llm_model,
1471 extraction,
1472 score=None # Could compute average confidence
1473 )
1474
1475 # Update graph
1476 try:
1477 update_graph_from_extraction(extraction, note_id, extraction_id)
1478 except Exception as e:
1479 return {
1480 "status": "graph_update_failed",
1481 "note_id": note_id,
1482 "extraction_id": extraction_id,
1483 "error": str(e)
1484 }
1485
1486 # Mark as processed
1487 mark_note_processed(note_id)
1488
1489 return {
1490 "status": "success",
1491 "note_id": note_id,
1492 "extraction_id": extraction_id,
1493 "nodes_extracted": len(extraction["nodes"]),
1494 "edges_extracted": len(extraction["edges"]),
1495 "summary": extraction.get("summary", "")
1496 }
1497
1498#### 3.3 Ingestion API Implementation
1499
15001. Implement `backend/app/api/ingest.py`:
1501
1502from fastapi import APIRouter, UploadFile, File, HTTPException, BackgroundTasks
1503from pydantic import BaseModel
1504from typing import List
1505from ..db.db import insert_note
1506from ..services.extractor import process_note
1507import zipfile
1508import io
1509
1510router = APIRouter()
1511
1512class IngestTextRequest(BaseModel):
1513 filename: str
1514 content: str
1515 source_path: str = None
1516
1517class IngestResponse(BaseModel):
1518 note_id: int
1519 status: str
1520 message: str
1521
1522@router.post("/text", response_model=IngestResponse)
1523async def ingest_text(payload: IngestTextRequest, background_tasks: BackgroundTasks):
1524 """
1525 Ingest text content for processing.
1526
1527 Saves note to database and triggers asynchronous extraction.
1528 """
1529 try:
1530 # Insert note
1531 note_id = insert_note(
1532 payload.filename,
1533 payload.content,
1534 payload.source_path
1535 )
1536
1537 # Process in background
1538 background_tasks.add_task(process_note, note_id)
1539
1540 return IngestResponse(
1541 note_id=note_id,
1542 status="accepted",
1543 message="Note saved and queued for processing"
1544 )
1545
1546 except Exception as e:
1547 raise HTTPException(status_code=500, detail=str(e))
1548
1549@router.post("/file")
1550async def ingest_file(
1551 file: UploadFile = File(...),
1552 background_tasks: BackgroundTasks = None
1553):
1554 """
1555 Ingest markdown file(s).
1556
1557 Supports single .md files or .zip archives containing multiple .md files.
1558 """
1559 if not file.filename.endswith(('.md', '.txt', '.zip')):
1560 raise HTTPException(
1561 status_code=400,
1562 detail="Only .md, .txt, or .zip files are supported"
1563 )
1564
1565 content = await file.read()
1566 note_ids = []
1567
1568 try:
1569 if file.filename.endswith('.zip'):
1570 # Handle zip archive
1571 with zipfile.ZipFile(io.BytesIO(content)) as zf:
1572 for filename in zf.namelist():
1573 if filename.endswith(('.md', '.txt')):
1574 file_content = zf.read(filename).decode('utf-8')
1575 note_id = insert_note(filename, file_content, file.filename)
1576 note_ids.append(note_id)
1577
1578 # Process in background
1579 if background_tasks:
1580 background_tasks.add_task(process_note, note_id)
1581 else:
1582 # Single file
1583 file_content = content.decode('utf-8')
1584 note_id = insert_note(file.filename, file_content, file.filename)
1585 note_ids.append(note_id)
1586
1587 # Process in background
1588 if background_tasks:
1589 background_tasks.add_task(process_note, note_id)
1590
1591 return {
1592 "status": "accepted",
1593 "note_ids": note_ids,
1594 "message": f"Ingested {len(note_ids)} file(s), processing started"
1595 }
1596
1597 except Exception as e:
1598 raise HTTPException(status_code=500, detail=str(e))
1599
1600@router.get("/status/{note_id}")
1601async def get_ingestion_status(note_id: int):
1602 """Check processing status of a note."""
1603 from ..db.db import get_note, get_extracts_for_note
1604
1605 note = get_note(note_id)
1606 if not note:
1607 raise HTTPException(status_code=404, detail="Note not found")
1608
1609 extracts = get_extracts_for_note(note_id)
1610
1611 return {
1612 "note_id": note_id,
1613 "processed": bool(note['processed']),
1614 "num_extracts": len(extracts),
1615 "created_at": note['created_at']
1616 }
1617
1618#### 3.4 Testing & Documentation
1619
16201. Create `tests/backend/test_extractor.py`:
1621
1622import pytest
1623from backend.app.services.extractor import (
1624 normalize_label,
1625 generate_node_id,
1626 parse_extraction_output
1627)
1628import json
1629
1630def test_normalize_label():
1631 """Test label normalization."""
1632 assert normalize_label("Artificial Intelligence") == "artificial_intelligence"
1633 assert normalize_label(" AI ") == "ai"
1634 assert normalize_label("Self-Driving Cars") == "selfdriving_cars"
1635
1636def test_generate_node_id():
1637 """Test deterministic node ID generation."""
1638 id1 = generate_node_id("test concept")
1639 id2 = generate_node_id("test concept")
1640 id3 = generate_node_id("different concept")
1641
1642 assert id1 == id2 # Same label produces same ID
1643 assert id1 != id3 # Different labels produce different IDs
1644 assert id1.startswith("node:")
1645
1646def test_parse_extraction_valid():
1647 """Test parsing valid extraction JSON."""
1648 valid_json = json.dumps({
1649 "nodes": [
1650 {"label": "sleep", "type": "concept", "span": [0, 5], "confidence": 0.9}
1651 ],
1652 "edges": [
1653 {"source": "sleep", "target": "health", "type": "affects", "confidence": 0.8}
1654 ],
1655 "summary": "Sleep affects health"
1656 })
1657
1658 result = parse_extraction_output(valid_json)
1659
1660 assert len(result["nodes"]) == 1
1661 assert result["nodes"][0]["label"] == "sleep"
1662 assert len(result["edges"]) == 1
1663 assert result["summary"] == "Sleep affects health"
1664
1665def test_parse_extraction_invalid_node_type():
1666 """Test parsing with invalid node type."""
1667 invalid_json = json.dumps({
1668 "nodes": [
1669 {"label": "test", "type": "invalid_type", "span": [0, 4], "confidence": 0.9}
1670 ],
1671 "edges": [],
1672 "summary": ""
1673 })
1674
1675 with pytest.raises(ValueError, match="Invalid node type"):
1676 parse_extraction_output(invalid_json)
1677
1678def test_parse_extraction_missing_fields():
1679 """Test parsing with missing required fields."""
1680 invalid_json = json.dumps({
1681 "nodes": [
1682 {"label": "test", "type": "concept"} # Missing span and confidence
1683 ],
1684 "edges": []
1685 })
1686
1687 with pytest.raises(ValueError, match="Invalid node structure"):
1688 parse_extraction_output(invalid_json)
1689
1690def test_parse_extraction_with_extra_text():
1691 """Test parsing JSON embedded in text."""
1692 output_with_text = """
1693 Here is the extraction result:
1694 {"nodes ": [{"label": "test", "type": "concept", "span": [0, 4], "confidence": 0.9}], "edges": [], "summary": "Test"}
1695 That's the analysis.
1696 """
1697
1698 result = parse_extraction_output(output_with_text)
1699
1700 assert len(result["nodes"]) == 1
1701 assert result["nodes"][0]["label"] == "test"
1702
1703# Mock LLM for integration testing
1704@pytest.fixture
1705def mock_llm_response(monkeypatch):
1706 """Mock LLM response for testing."""
1707 def mock_call_local_llm(prompt: str, model: str = None) -> str:
1708 return json.dumps({
1709 "nodes": [
1710 {"label": "sleep", "type": "concept", "span": [0, 5], "confidence": 0.95},
1711 {"label": "work", "type": "activity", "span": [20, 24], "confidence": 0.9}
1712 ],
1713 "edges": [
1714 {"source": "sleep", "target": "work", "type": "affects", "confidence": 0.9}
1715 ],
1716 "summary": "Sleep impacts work performance"
1717 })
1718
1719 from backend.app.services import extractor
1720 monkeypatch.setattr(extractor, "call_local_llm", mock_call_local_llm)
1721
1722def test_extract_from_text(mock_llm_response, temp_db):
1723 """Test full extraction from text."""
1724 from backend.app.services.extractor import extract_from_text
1725 from backend.app.db.db import insert_note
1726
1727 note_id = insert_note("test.md", "Sleep affects work")
1728
1729 result = extract_from_text("Sleep affects work", note_id)
1730
1731 assert len(result["nodes"]) == 2
1732 assert len(result["edges"]) == 1
1733 assert result["summary"] == "Sleep impacts work performance"
1734 assert all(node["note_id"] == note_id for node in result["nodes"])
1735
17362. Update `docs/llm_prompting.md` with complete extraction prompt templates and examples (as shown in Task 3.1)
1737
17383. Update `docs/api-spec.md` with ingestion endpoints:
1739
1740## Ingestion Endpoints
1741
1742### POST /api/ingest/text
1743
1744Ingest text content for processing.
1745
1746**Request Body:**
1747{
1748 "filename": "daily-journal-2024-01-15.md",
1749 "content": "Today I realized that consistent sleep patterns directly impact my productivity...",
1750 "source_path": "/optional/path/to/file"
1751}
1752
1753**Response:**
1754{
1755 "note_id": 42,
1756 "status": "accepted",
1757 "message": "Note saved and queued for processing"
1758}
1759
1760**Process:**
17611. Content is saved to SQLite `notes` table
17622. Note hash is computed for deduplication
17633. Background task is queued to run LLM extraction
17644. Extraction results are stored in `extracts` table
17655. Graph is updated with nodes and edges
17666. Note is marked as processed
1767
1768### POST /api/ingest/file
1769
1770Upload markdown file(s) for processing.
1771
1772**Request:**
1773- Content-Type: `multipart/form-data`
1774- Field: `file` (UploadFile)
1775- Supported formats: `.md`, `.txt`, `.zip`
1776
1777**Response:**
1778{
1779 "status": "accepted",
1780 "note_ids": [42, 43, 44],
1781 "message": "Ingested 3 file(s), processing started"
1782}
1783
1784**Zip Archive Support:**
1785- Upload a `.zip` containing multiple markdown files
1786- All `.md` and `.txt` files within the archive are extracted
1787- Each file is processed as a separate note
1788
1789### GET /api/ingest/status/{note_id}
1790
1791Check processing status of an ingested note.
1792
1793**Response:**
1794{
1795 "note_id": 42,
1796 "processed": true,
1797 "num_extracts": 1,
1798 "created_at": "2024-01-15T10:30:00"
1799}
1800
18014. Update `docs/cicd_devops.md` with LLM configuration:
1802
1803## Local LLM Setup
1804
1805### Ollama Installation (Recommended)
1806
18071. Install Ollama:
1808curl -fsSL https://ollama.com/install.sh | sh
1809
18102. Pull the required model:
1811ollama pull llama3
1812
18133. Start Ollama server (runs on http://localhost:11434):
1814ollama serve
1815
18164. Test the endpoint:
1817curl http://localhost:11434/api/generate -d '{
1818 "model": "llama3",
1819 "prompt": "Extract entities from: The AI revolution is changing software.",
1820 "stream": false
1821}'
1822
1823### Alternative: Llama.cpp
1824
1825If you prefer llama.cpp for lower-level control:
1826
18271. Clone and build:
1828git clone https://github.com/ggerganov/llama.cpp
1829cd llama.cpp
1830make
1831
18322. Download model (e.g., Llama-3-8B GGUF):
1833# Download from HuggingFace or other source
1834
18353. Run server:
1836./server -m models/llama-3-8b-q4_0.gguf --port 11434
1837
1838### Configuration
1839
1840Update `backend/.env`:
1841LLM_ENDPOINT=http://localhost:11434/api/generate
1842LLM_MODEL=llama3
1843EMBEDDING_ENDPOINT=http://localhost:11434/api/embeddings
1844EMBEDDING_MODEL=all-minilm
1845EXTRACTION_TIMEOUT=300
1846
1847**Deliverables**:
1848- `backend/app/services/extractor.py` with full extraction pipeline
1849- `backend/app/api/ingest.py` with ingestion endpoints
1850- `tests/backend/test_extractor.py` with unit tests
1851- Updated documentation in `/docs/`
1852
1853**Completion Threshold**:
1854- [ ] Extraction function correctly parses LLM JSON output
1855- [ ] Mock-based tests pass: `pytest tests/backend/test_extractor.py`
1856- [ ] Manual test with local LLM: Ingest sample note and verify extraction in SQLite
1857- [ ] Graph is updated with nodes/edges after ingestion
1858- [ ] `POST /api/ingest/text` returns 200 with note_id
1859- [ ] Update `checklist.md` with Phase 3 completion
1860- [ ] Log LLM integration decisions in `decisions.md`
1861
1862---
1863
1864### Phase 4: Embeddings & Semantic Search
1865
1866**Objective**: Implement local embeddings and vector-based semantic search.
1867
1868**Pre-requisites**: Phases 1-3 complete
1869
1870**Tasks**:
1871
1872#### 4.1 Embeddings Service
1873
18741. Create `backend/app/services/embeddings.py`:
1875
1876from sentence_transformers import SentenceTransformer
1877import chromadb
1878from chromadb.config import Settings
1879from typing import List, Dict, Optional
1880from pathlib import Path
1881from ..config import settings
1882import numpy as np
1883
1884class EmbeddingStore:
1885 """Manages embeddings using sentence-transformers and ChromaDB."""
1886
1887 def __init__(self):
1888 # Initialize sentence transformer model
1889 self.model = SentenceTransformer('all-MiniLM-L6-v2')
1890
1891 # Initialize ChromaDB
1892 self.chroma_client = chromadb.Client(Settings(
1893 chroma_db_impl="duckdb+parquet",
1894 persist_directory=str(settings.vector_db_path)
1895 ))
1896
1897 # Get or create collections
1898 self.notes_collection = self.chroma_client.get_or_create_collection(
1899 name="notes",
1900 metadata={"description": "Note embeddings"}
1901 )
1902
1903 self.nodes_collection = self.chroma_client.get_or_create_collection(
1904 name="nodes",
1905 metadata={"description": "Node label embeddings"}
1906 )
1907
1908 def embed_text(self, text: str) -> List[float]:
1909 """Generate embedding for text."""
1910 embedding = self.model.encode(text, convert_to_numpy=True)
1911 return embedding.tolist()
1912
1913 def embed_batch(self, texts: List[str]) -> List[List[float]]:
1914 """Generate embeddings for multiple texts."""
1915 embeddings = self.model.encode(texts, convert_to_numpy=True)
1916 return embeddings.tolist()
1917
1918 def index_note(self, note_id: int, content: str, metadata: Dict = None):
1919 """Index a note for semantic search."""
1920 embedding = self.embed_text(content)
1921
1922 self.notes_collection.add(
1923 ids=[f"note:{note_id}"],
1924 embeddings=[embedding],
1925 documents=[content],
1926 metadatas=[metadata or {}]
1927 )
1928
1929 def index_node(self, node_id: str, label: str, node_type: str, metadata: Dict = None):
1930 """Index a node for semantic search."""
1931 embedding = self.embed_text(label)
1932
1933 self.nodes_collection.add(
1934 ids=[node_id],
1935 embeddings=[embedding],
1936 documents=[label],
1937 metadatas=metadata or {}
1938 )
1939
1940 def search_notes(self, query: str, top_k: int = 10) -> List[Dict]:
1941 """
1942 Search notes by semantic similarity.
1943
1944 Args:
1945 query: Search query
1946 top_k: Number of results to return
1947
1948 Returns:
1949 List of results with note_id, content, and similarity score
1950 """
1951 query_embedding = self.embed_text(query)
1952
1953 results = self.notes_collection.query(
1954 query_embeddings=[query_embedding],
1955 n_results=top_k
1956 )
1957
1958 if not results['ids'] or not results['ids'][0]:
1959 return []
1960
1961 output = []
1962 for i, note_ref in enumerate(results['ids'][0]):
1963 note_id = int(note_ref.split(':')[1])
1964 output.append({
1965 'note_id': note_id,
1966 'content': results['documents'][0][i],
1967 'score': 1 - results['distances'][0][i], # Convert distance to similarity
1968 'metadata': results['metadatas'][0][i] if results['metadatas'] else {}
1969 })
1970
1971 return output
1972
1973 def search_nodes(self, query: str, top_k: int = 10) -> List[Dict]:
1974 """
1975 Search nodes by semantic similarity.
1976
1977 Args:
1978 query: Search query
1979 top_k: Number of results to return
1980
1981 Returns:
1982 List of results with node_id, label, and similarity score
1983 """
1984 query_embedding = self.embed_text(query)
1985
1986 results = self.nodes_collection.query(
1987 query_embeddings=[query_embedding],
1988 n_results=top_k
1989 )
1990
1991 if not results['ids'] or not results['ids'][0]:
1992 return []
1993
1994 output = []
1995 for i, node_id in enumerate(results['ids'][0]):
1996 output.append({
1997 'node_id': node_id,
1998 'label': results['documents'][0][i],
1999 'score': 1 - results['distances'][0][i],
2000 'metadata': results['metadatas'][0][i] if results['metadatas'] else {}
2001 })
2002
2003 return output
2004
2005 def delete_note(self, note_id: int):
2006 """Remove note from index."""
2007 try:
2008 self.notes_collection.delete(ids=[f"note:{note_id}"])
2009 except:
2010 pass # Note may not exist in index
2011
2012 def delete_node(self, node_id: str):
2013 """Remove node from index."""
2014 try:
2015 self.nodes_collection.delete(ids=[node_id])
2016 except:
2017 pass # Node may not exist in index
2018
2019# Global instance
2020_embedding_store = None
2021
2022def get_embedding_store() -> EmbeddingStore:
2023 """Get or create global embedding store instance."""
2024 global _embedding_store
2025 if _embedding_store is None:
2026 _embedding_store = EmbeddingStore()
2027 return _embedding_store
2028
2029def init_embeddings():
2030 """Initialize embedding store on startup."""
2031 global _embedding_store
2032 settings.vector_db_path.mkdir(parents=True, exist_ok=True)
2033 _embedding_store = EmbeddingStore()
2034
20352. Update `backend/app/services/extractor.py` to index embeddings after extraction:
2036
2037# Add this import at the top
2038from .embeddings import get_embedding_store
2039
2040# Update the update_graph_from_extraction function to include embedding indexing
2041def update_graph_from_extraction(extraction: Dict, note_id: int, extraction_id: int):
2042 """
2043 Update NetworkX graph with extraction results.
2044
2045 Args:
2046 extraction: Parsed extraction dict
2047 note_id: Source note ID
2048 extraction_id: Extract record ID
2049 """
2050 graph_store = get_graph_store()
2051 embedding_store = get_embedding_store()
2052
2053 # Track created node IDs for edge creation
2054 node_label_to_id = {}
2055
2056 # Add/update nodes
2057 for node_data in extraction["nodes"]:
2058 label = node_data["label"]
2059 node_id = generate_node_id(label)
2060
2061 span_start, span_end = node_data["span"]
2062 provenance = [(note_id, span_start, span_end)]
2063
2064 graph_store.add_node(
2065 node_id,
2066 label,
2067 node_data["type"],
2068 provenance=provenance,
2069 confidence=node_data["confidence"]
2070 )
2071
2072 # Index node embedding
2073 embedding_store.index_node(
2074 node_id,
2075 label,
2076 node_data["type"],
2077 metadata={'confidence': node_data['confidence']}
2078 )
2079
2080 node_label_to_id[label] = node_id
2081
2082 # Add edges (existing code)
2083 for edge_data in extraction["edges"]:
2084 source_label = edge_data["source"]
2085 target_label = edge_data["target"]
2086
2087 source_id = node_label_to_id.get(source_label, generate_node_id(source_label))
2088 target_id = node_label_to_id.get(target_label, generate_node_id(target_label))
2089
2090 if not graph_store.graph.has_node(source_id) or not graph_store.graph.has_node(target_id):
2091 continue
2092
2093 graph_store.add_edge(
2094 source_id,
2095 target_id,
2096 edge_data["type"],
2097 weight=edge_data["confidence"],
2098 extraction_id=extraction_id
2099 )
2100
2101 # Save graph
2102 graph_store.save()
2103
2104# Update process_note to index note embedding
2105def process_note(note_id: int) -> Dict:
2106 """
2107 Full extraction pipeline for a note.
2108
2109 Args:
2110 note_id: Note to process
2111
2112 Returns:
2113 Processing result with stats
2114 """
2115 # Get note content
2116 note = get_note(note_id)
2117 if not note:
2118 raise ValueError(f"Note {note_id} not found")
2119
2120 if note['processed']:
2121 return {"status": "already_processed", "note_id": note_id}
2122
2123 content = note['content']
2124
2125 # Index note embedding
2126 embedding_store = get_embedding_store()
2127 embedding_store.index_note(
2128 note_id,
2129 content,
2130 metadata={'filename': note['filename'], 'created_at': note['created_at']}
2131 )
2132
2133 # Extract (existing code continues...)
2134 try:
2135 extraction = extract_from_text(content, note_id)
2136 except Exception as e:
2137 return {
2138 "status": "extraction_failed",
2139 "note_id": note_id,
2140 "error": str(e)
2141 }
2142
2143 # Store extract
2144 extraction_id = insert_extract(
2145 note_id,
2146 settings.llm_model,
2147 extraction,
2148 score=None
2149 )
2150
2151 # Update graph
2152 try:
2153 update_graph_from_extraction(extraction, note_id, extraction_id)
2154 except Exception as e:
2155 return {
2156 "status": "graph_update_failed",
2157 "note_id": note_id,
2158 "extraction_id": extraction_id,
2159 "error": str(e)
2160 }
2161
2162 # Mark as processed
2163 mark_note_processed(note_id)
2164
2165 return {
2166 "status": "success",
2167 "note_id": note_id,
2168 "extraction_id": extraction_id,
2169 "nodes_extracted": len(extraction["nodes"]),
2170 "edges_extracted": len(extraction["edges"]),
2171 "summary": extraction.get("summary", "")
2172 }
2173
21743. Update `backend/app/main.py` to initialize embeddings:
2175
2176@app.on_event("startup")
2177async def startup_event():
2178 init_database()
2179 from .services.graph_store import init_graph
2180 from .services.embeddings import init_embeddings
2181 init_graph()
2182 init_embeddings()
2183
2184#### 4.2 Search API Implementation
2185
21861. Implement `backend/app/api/search.py`:
2187
2188from fastapi import APIRouter, Query
2189from pydantic import BaseModel
2190from typing import List, Dict
2191from ..services.embeddings import get_embedding_store
2192from ..services.graph_store import get_graph_store
2193from ..db.db import get_note
2194
2195router = APIRouter()
2196
2197class SemanticSearchRequest(BaseModel):
2198 q: str
2199 top_k: int = 10
2200 search_type: str = "both" # "notes", "nodes", or "both"
2201
2202class SearchResult(BaseModel):
2203 type: str # "note" or "node"
2204 id: str
2205 content: str
2206 score: float
2207 metadata: Dict = {}
2208
2209@router.post("/semantic")
2210async def semantic_search(request: SemanticSearchRequest):
2211 """
2212 Semantic search across notes and/or nodes.
2213
2214 Args:
2215 q: Search query
2216 top_k: Number of results to return
2217 search_type: Search scope ("notes", "nodes", or "both")
2218
2219 Returns:
2220 Ranked list of results
2221 """
2222 embedding_store = get_embedding_store()
2223 results = []
2224
2225 if request.search_type in ["notes", "both"]:
2226 note_results = embedding_store.search_notes(request.q, request.top_k)
2227 for r in note_results:
2228 results.append(SearchResult(
2229 type="note",
2230 id=str(r['note_id']),
2231 content=r['content'][:200] + "..." if len(r['content']) > 200 else r['content'],
2232 score=r['score'],
2233 metadata=r['metadata']
2234 ))
2235
2236 if request.search_type in ["nodes", "both"]:
2237 node_results = embedding_store.search_nodes(request.q, request.top_k)
2238 graph_store = get_graph_store()
2239
2240 for r in node_results:
2241 node = graph_store.get_node(r['node_id'])
2242 if node:
2243 results.append(SearchResult(
2244 type="node",
2245 id=r['node_id'],
2246 content=r['label'],
2247 score=r['score'],
2248 metadata={
2249 'node_type': node.get('type'),
2250 'provenance_count': len(node.get('provenance', []))
2251 }
2252 ))
2253
2254 # Sort by score descending
2255 results.sort(key=lambda x: x.score, reverse=True)
2256
2257 # Limit to top_k
2258 results = results[:request.top_k]
2259
2260 return {
2261 "query": request.q,
2262 "results": [r.dict() for r in results],
2263 "total": len(results)
2264 }
2265
2266@router.get("/related/{node_id}")
2267async def get_related_nodes(
2268 node_id: str,
2269 top_k: int = Query(5, description="Number of related nodes to return")
2270):
2271 """
2272 Find semantically related nodes.
2273
2274 Uses the node label as query to find similar nodes.
2275 """
2276 graph_store = get_graph_store()
2277 embedding_store = get_embedding_store()
2278
2279 node = graph_store.get_node(node_id)
2280 if not node:
2281 return {"error": "Node not found"}
2282
2283 # Search for similar nodes using label
2284 similar_nodes = embedding_store.search_nodes(node['label'], top_k + 1)
2285
2286 # Filter out the query node itself
2287 similar_nodes = [n for n in similar_nodes if n['node_id'] != node_id][:top_k]
2288
2289 return {
2290 "source_node": node_id,
2291 "related_nodes": similar_nodes
2292 }
2293
2294#### 4.3 Testing & Documentation
2295
22961. Create `tests/backend/test_embeddings.py`:
2297
2298import pytest
2299from backend.app.services.embeddings import EmbeddingStore
2300import tempfile
2301from pathlib import Path
2302import shutil
2303
2304@pytest.fixture
2305def temp_embedding_store():
2306 """Create temporary embedding store."""
2307 temp_dir = Path(tempfile.mkdtemp())
2308
2309 # Mock settings
2310 from backend.app import config
2311 original_path = config.settings.vector_db_path
2312 config.settings.vector_db_path = temp_dir
2313
2314 store = EmbeddingStore()
2315
2316 yield store
2317
2318 # Cleanup
2319 shutil.rmtree(temp_dir)
2320 config.settings.vector_db_path = original_path
2321
2322def test_embed_text(temp_embedding_store):
2323 """Test text embedding generation."""
2324 embedding = temp_embedding_store.embed_text("test content")
2325
2326 assert isinstance(embedding, list)
2327 assert len(embedding) == 384 # all-MiniLM-L6-v2 dimension
2328 assert all(isinstance(x, float) for x in embedding)
2329
2330def test_index_and_search_notes(temp_embedding_store):
2331 """Test note indexing and search."""
2332 # Index notes
2333 temp_embedding_store.index_note(1, "Machine learning is transforming AI")
2334 temp_embedding_store.index_note(2, "I love cooking pasta with fresh tomatoes")
2335 temp_embedding_store.index_note(3, "Neural networks and deep learning")
2336
2337 # Search
2338 results = temp_embedding_store.search_notes("artificial intelligence", top_k=2)
2339
2340 assert len(results) <= 2
2341 assert results[0]['note_id'] in [1, 3] # Should match AI-related notes
2342 assert 'score' in results[0]
2343
2344def test_index_and_search_nodes(temp_embedding_store):
2345 """Test node indexing and search."""
2346 # Index nodes
2347 temp_embedding_store.index_node("node:1", "machine learning", "concept")
2348 temp_embedding_store.index_node("node:2", "pasta", "concept")
2349 temp_embedding_store.index_node("node:3", "deep learning", "concept")
2350
2351 # Search
2352 results = temp_embedding_store.search_nodes("AI algorithms", top_k=2)
2353
2354 assert len(results) <= 2
2355 # Should prioritize ML-related nodes
2356 top_result_label = results[0]['label'].lower()
2357 assert any(term in top_result_label for term in ['machine', 'learning', 'deep'])
2358
2359def test_delete_note(temp_embedding_store):
2360 """Test note deletion from index."""
2361 temp_embedding_store.index_note(1, "test content")
2362
2363 # Verify indexed
2364 results = temp_embedding_store.search_notes("test", top_k=5)
2365 assert any(r['note_id'] == 1 for r in results)
2366
2367 # Delete
2368 temp_embedding_store.delete_note(1)
2369
2370 # Verify removed
2371 results = temp_embedding_store.search_notes("test", top_k=5)
2372 assert not any(r['note_id'] == 1 for r in results)
2373
23742. Update `docs/architecture.md` with embeddings architecture:
2375
2376## Embeddings & Vector Search
2377
2378### Architecture
2379
2380The system uses a two-tier embedding strategy:
2381
23821. **Note Embeddings**: Full note content is embedded for semantic document search
23832. **Node Embeddings**: Individual node labels are embedded for entity-level search
2384
2385### Technology Stack
2386
2387- **Embedding Model**: sentence-transformers (`all-MiniLM-L6-v2`)
2388 - Dimension: 384
2389 - Fast inference on CPU
2390 - Good balance of speed and quality
2391
2392- **Vector Store**: ChromaDB with DuckDB+Parquet backend
2393 - Persistent local storage
2394 - Efficient similarity search
2395 - No external dependencies
2396
2397### Workflow
2398
2399[New Note] → [Extract Text] → [Generate Embedding] → [Index in ChromaDB]
2400
2401[User Query] → [Generate Query Embedding] → [Similarity Search] → [Ranked Results]
2402
2403### Search Process
2404
24051. User submits search query
24062. Query is embedded using same model
24073. Vector similarity (cosine) computed against indexed vectors
24084. Results ranked by similarity score (0-1)
24095. Top-k results returned with metadata
2410
2411### Performance Considerations
2412
2413- Embedding generation: ~50ms per note on CPU
2414- Search latency: <100ms for 10k vectors
2415- Index persistence: Automatic on collection update
2416
24173. Update `docs/api-spec.md` with search endpoints:
2418
2419## Search Endpoints
2420
2421### POST /api/search/semantic
2422
2423Semantic search across notes and/or nodes.
2424
2425**Request Body:**
2426{
2427 "q": "how does sleep affect productivity",
2428 "top_k": 10,
2429 "search_type": "both"
2430}
2431
2432**Parameters:**
2433- `q`: Search query (required)
2434- `top_k`: Number of results (default: 10)
2435- `search_type`: Scope - "notes", "nodes", or "both" (default: "both")
2436
2437**Response:**
2438{
2439 "query": "how does sleep affect productivity",
2440 "results": [
2441 {
2442 "type": "node",
2443 "id": "node:sleep_quality_a3f9e2b1",
2444 "content": "sleep quality",
2445 "score": 0.92,
2446 "metadata": {
2447 "node_type": "concept",
2448 "provenance_count": 3
2449 }
2450 },
2451 {
2452 "type": "note",
2453 "id": "42",
2454 "content": "I've noticed that when I sleep poorly, my work performance drops significantly...",
2455 "score": 0.88,
2456 "metadata": {
2457 "filename": "journal-2024-01-15.md",
2458 "created_at": "2024-01-15T10:30:00"
2459 }
2460 }
2461 ],
2462 "total": 2
2463}
2464
2465### GET /api/search/related/{node_id}
2466
2467Find semantically related nodes.
2468
2469**Parameters:**
2470- `node_id`: Source node ID
2471- `top_k`: Number of results (default: 5)
2472
2473**Response:**
2474{
2475 "source_node": "node:sleep_quality_a3f9e2b1",
2476 "related_nodes": [
2477 {
2478 "node_id": "node:rest_patterns_b2c4d5e6",
2479 "label": "rest patterns",
2480 "score": 0.89
2481 },
2482 {
2483 "node_id": "node:circadian_rhythm_c3d4e5f6",
2484 "label": "circadian rhythm",
2485 "score": 0.85
2486 }
2487 ]
2488}
2489
2490**Deliverables**:
2491- `backend/app/services/embeddings.py` with full embedding functionality
2492- Updated `backend/app/services/extractor.py` to index embeddings
2493- `backend/app/api/search.py` with semantic search endpoints
2494- `tests/backend/test_embeddings.py` with unit tests
2495- Updated documentation
2496
2497**Completion Threshold**:
2498- [ ] Embeddings are generated for notes and nodes during ingestion
2499- [ ] Semantic search returns relevant results: `pytest tests/backend/test_embeddings.py`
2500- [ ] `POST /api/search/semantic` returns ranked results
2501- [ ] Vector store persists across application restarts
2502- [ ] Update `checklist.md` with Phase 4 completion
2503- [ ] Log embedding strategy in `decisions.md`
2504
2505---
2506
2507### Phase 5: Frontend Setup & Graph Visualization
2508
2509**Objective**: Create Next.js frontend with interactive graph visualization.
2510
2511**Pre-requisites**: Phases 1-4 complete (backend functional)
2512
2513**Tasks**:
2514
2515#### 5.1 Next.js Project Setup
2516
25171. Initialize Next.js project:
2518
2519cd frontend
2520npx create-next-app@latest . --typescript --tailwind --app --no-src-dir
2521
25222. Install dependencies:
2523
2524npm install cytoscape react-cytoscapejs axios react-query @tanstack/react-query
2525npm install -D @types/cytoscape
2526
25273. Create `frontend/next.config.js`:
2528
2529/** @type {import('next').NextConfig} */
2530const nextConfig = {
2531 async rewrites() {
2532 return [
2533 {
2534 source: '/api/:path*',
2535 destination: 'http://localhost:8000/api/:path*',
2536 },
2537 ];
2538 },
2539};
2540
2541module.exports = nextConfig;
2542
25434. Create `frontend/lib/api.ts`:
2544
2545import axios from 'axios';
2546
2547const API_BASE = process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000';
2548
2549const api = axios.create({
2550 baseURL: API_BASE,
2551 timeout: 30000,
2552 headers: {
2553 'Content-Type': 'application/json',
2554 },
2555});
2556
2557export interface Node {
2558 id: string;
2559 label: string;
2560 type: string;
2561 provenance: [number, number, number][];
2562 confidence?: number;
2563 created_at: string;
2564 updated_at: string;
2565}
2566
2567export interface Edge {
2568 source: string;
2569 target: string;
2570 type: string;
2571 weight: number;
2572 extraction_id?: number;
2573 created_at: string;
2574}
2575
2576export interface GraphData {
2577 nodes: Node[];
2578 edges: Edge[];
2579}
2580
2581export interface SearchResult {
2582 type: 'note' | 'node';
2583 id: string;
2584 content: string;
2585 score: number;
2586 metadata: Record<string, any>;
2587}
2588
2589// Graph API
2590export const graphAPI = {
2591 getGraph: async (nodeId?: string, depth?: number): Promise<GraphData> => {
2592 const params = new URLSearchParams();
2593 if (nodeId) params.append('node_id', nodeId);
2594 if (depth) params.append('depth', depth.toString());
2595
2596 const response = await api.get(`/api/graph?${params.toString()}`);
2597 return response.data;
2598 },
2599
2600 getNode: async (nodeId: string): Promise<Node> => {
2601 const response = await api.get(`/api/graph/node/${nodeId}`);
2602 return response.data;
2603 },
2604
2605 createNode: async (node: Partial<Node>): Promise<{ node_id: string }> => {
2606 const response = await api.post('/api/graph/node', node);
2607 return response.data;
2608 },
2609
2610 createEdge: async (edge: Partial<Edge>): Promise<{ status: string }> => {
2611 const response = await api.post('/api/graph/edge', edge);
2612 return response.data;
2613 },
2614
2615 getStats: async (): Promise<any> => {
2616 const response = await api.get('/api/graph/stats');
2617 return response.data;
2618 },
2619};
2620
2621// Search API
2622export const searchAPI = {
2623 semantic: async (query: string, topK: number = 10, searchType: string = 'both'): Promise<SearchResult[]> => {
2624 const response = await api.post('/api/search/semantic', {
2625 q: query,
2626 top_k: topK,
2627 search_type: searchType,
2628 });
2629 return response.data.results;
2630 },
2631
2632 related: async (nodeId: string, topK: number = 5): Promise<any> => {
2633 const response = await api.get(`/api/search/related/${nodeId}?top_k=${topK}`);
2634 return response.data;
2635 },
2636};
2637
2638// Ingestion API
2639export const ingestAPI = {
2640 ingestText: async (filename: string, content: string): Promise<{ note_id: number }> => {
2641 const response = await api.post('/api/ingest/text', {
2642 filename,
2643 content,
2644 });
2645 return response.data;
2646 },
2647
2648 ingestFile: async (file: File): Promise<{ note_ids: number[] }> => {
2649 const formData = new FormData();
2650 formData.append('file', file);
2651
2652 const response = await api.post('/api/ingest/file', formData, {
2653 headers: {
2654 'Content-Type': 'multipart/form-data',
2655 },
2656 });
2657 return response.data;
2658 },
2659
2660 getStatus: async (noteId: number): Promise<any> => {
2661 const response = await api.get(`/api/ingest/status/${noteId}`);
2662 return response.data;
2663 },
2664};
2665
2666export default api;
2667
2668#### 5.2 Graph Visualization Component
2669
26701. Create `frontend/components/GraphCanvas.tsx`:
2671
2672'use client';
2673
2674import React, { useEffect, useRef, useState } from 'react';
2675import CytoscapeComponent from 'react-cytoscapejs';
2676import Cytoscape from 'cytoscape';
2677import { GraphData, Node } from '@/lib/api';
2678
2679interface GraphCanvasProps {
2680 data: GraphData;
2681 onNodeClick?: (node: Node) => void;
2682 onNodeDoubleClick?: (node: Node) => void;
2683 selectedNodeId?: string;
2684}
2685
2686const GraphCanvas: React.FC<GraphCanvasProps> = ({
2687 data,
2688 onNodeClick,
2689 onNodeDoubleClick,
2690 selectedNodeId,
2691}) => {
2692 const cyRef = useRef<Cytoscape.Core | null>(null);
2693 const [elements, setElements] = useState<any[]>([]);
2694
2695 useEffect(() => {
2696 // Convert GraphData to Cytoscape elements
2697 const nodes = data.nodes.map((node) => ({
2698 data: {
2699 id: node.id,
2700 label: node.label,
2701 type: node.type,
2702 confidence: node.confidence || 1,
2703 provenanceCount: node.provenance?.length || 0,
2704 },
2705 }));
2706
2707 const edges = data.edges.map((edge, idx) => ({
2708 data: {
2709 id: `edge-${idx}`,
2710 source: edge.source,
2711 target: edge.target,
2712 label: edge.type,
2713 weight: edge.weight,
2714 },
2715 }));
2716
2717 setElements([...nodes, ...edges]);
2718 }, [data]);
2719
2720 useEffect(() => {
2721 if (cyRef.current && selectedNodeId) {
2722 // Highlight selected node
2723 cyRef.current.nodes().removeClass('selected');
2724 cyRef.current.getElementById(selectedNodeId).addClass('selected');
2725 }
2726 }, [selectedNodeId]);
2727
2728 const stylesheet: Cytoscape.Stylesheet[] = [
2729 {
2730 selector: 'node',
2731 style: {
2732 'background-color': (ele: any) => {
2733 const type = ele.data('type');
2734 const colors: Record<string, string> = {
2735 concept: '#3b82f6',
2736 person: '#10b981',
2737 place: '#f59e0b',
2738 idea: '#8b5cf6',
2739 event: '#ef4444',
2740 passage: '#6b7280',
2741 };
2742 return colors[type] || '#9ca3af';
2743 },
2744 'label': 'data(label)',
2745 'width': (ele: any) => {
2746 const provCount = ele.data('provenanceCount') || 1;
2747 return Math.min(20 + provCount * 5, 60);
2748 },
2749 'height': (ele: any) => {
2750 const provCount = ele.data('provenanceCount') || 1;
2751 return Math.min(20 + provCount * 5, 60);
2752 },
2753 'font-size': '12px',
2754 'color': '#fff',
2755 'text-valign': 'center',
2756 'text-halign': 'center',
2757 'text-wrap': 'wrap',
2758 'text-max-width': '80px',
2759 },
2760 },
2761 {
2762 selector: 'node.selected',
2763 style: {
2764 'border-width': 3,
2765 'border-color': '#fbbf24',
2766 },
2767 },
2768 {
2769 selector: 'edge',
2770 style: {
2771 'width': (ele: any) => {
2772 const weight = ele.data('weight') || 0.5;
2773 return 1 + weight * 3;
2774 },
2775 'line-color': '#cbd5e1',
2776 'target-arrow-color': '#cbd5e1',
2777 'target-arrow-shape': 'triangle',
2778 'curve-style': 'bezier',
2779 'label': 'data(label)',
2780 'font-size': '10px',
2781 'text-rotation': 'autorotate',
2782 'text-margin-y': -10,
2783 },
2784 },
2785 ];
2786
2787 const layout = {
2788 name: 'cose',
2789 animate: true,
2790 animationDuration: 500,
2791 fit: true,
2792 padding: 30,
2793 nodeRepulsion: 8000,
2794 idealEdgeLength: 100,
2795 edgeElasticity: 100,
2796 nestingFactor: 1.2,
2797 };
2798
2799 const handleCyReady = (cy: Cytoscape.Core) => {
2800 cyRef.current = cy;
2801
2802 // Node click handler
2803 cy.on('tap', 'node', (evt) => {
2804 const node = evt.target;
2805 const nodeData = data.nodes.find((n) => n.id === node.id());
2806 if (nodeData && onNodeClick) {
2807 onNodeClick(nodeData);
2808 }
2809 });
2810
2811 // Node double-click handler
2812 cy.on('dbltap', 'node', (evt) => {
2813 const node = evt.target;
2814 const nodeData = data.nodes.find((n) => n.id === node.id());
2815 if (nodeData && onNodeDoubleClick) {
2816 onNodeDoubleClick(nodeData);
2817 }
2818 });
2819 };
2820
2821 return (
2822 <div className="w-full h-full bg-gray-900 rounded-lg overflow-hidden">
2823 {elements.length > 0 ? (
2824 <CytoscapeComponent
2825 elements={elements}
2826 stylesheet={stylesheet}
2827 layout={layout}
2828 style={{ width: '100%', height: '100%' }}
2829 cy={handleCyReady}
2830 zoom={1}
2831 pan={{ x: 0, y: 0 }}
2832 minZoom={0.3}
2833 maxZoom={3}
2834 wheelSensitivity={0.2}
2835 />
2836 ) : (
2837 <div className="flex items-center justify-center h-full text-gray-400">
2838 No graph data available. Ingest some notes to get started.
2839 </div>
2840 )}
2841 </div>
2842 );
2843};
2844
2845export default GraphCanvas;
2846
28472. Create `frontend/components/NodeDetailsPanel.tsx`:
2848
2849'use client';
2850
2851import React, { useEffect, useState } from 'react';
2852import { Node, graphAPI } from '@/lib/api';
2853import { XMarkIcon } from '@heroicons/react/24/outline';
2854
2855interface NodeDetailsPanelProps {
2856 nodeId: string;
2857 onClose: () => void;
2858}
2859
2860const NodeDetailsPanel: React.FC<NodeDetailsPanelProps> = ({ nodeId, onClose }) => {
2861 const [node, setNode] = useState<Node | null>(null);
2862 const [loading, setLoading] = useState(true);
2863 const [error, setError] = useState<string | null>(null);
2864
2865 useEffect(() => {
2866 const fetchNode = async () => {
2867 try {
2868 setLoading(true);
2869 const nodeData = await graphAPI.getNode(nodeId);
2870 setNode(nodeData);
2871 setError(null);
2872 } catch (err) {
2873 setError('Failed to load node details');
2874 console.error(err);
2875 } finally {
2876 setLoading(false);
2877 }
2878 };
2879
2880 fetchNode();
2881 }, [nodeId]);
2882
2883 if (loading) {
2884 return (
2885 <div className="w-96 bg-gray-800 text-white p-6 shadow-lg">
2886 <div className="animate-pulse">
2887 <div className="h-4 bg-gray-700 rounded w-3/4 mb-4"></div>
2888 <div className="h-4 bg-gray-700 rounded w-1/2"></div>
2889 </div>
2890 </div>
2891 );
2892 }
2893
2894 if (error || !node) {
2895 return (
2896 <div className="w-96 bg-gray-800 text-white p-6 shadow-lg">
2897 <div className="flex justify-between items-start mb-4">
2898 <h2 className="text-xl font-bold text-red-400">Error</h2>
2899 <button onClick={onClose} className="text-gray-400 hover:text-white">
2900 <XMarkIcon className="w-6 h-6" />
2901 </button>
2902 </div>
2903 <p>{error || 'Node not found'}</p>
2904 </div>
2905 );
2906 }
2907
2908 return (
2909 <div className="w-96 bg-gray-800 text-white p-6 shadow-lg overflow-y-auto max-h-screen">
2910 <div className="flex justify-between items-start mb-4">
2911 <h2 className="text-2xl font-bold">{node.label}</h2>
2912 <button onClick={onClose} className="text-gray-400 hover:text-white">
2913 <XMarkIcon className="w-6 h-6" />
2914 </button>
2915 </div>
2916
2917 <div className="space-y-4">
2918 {/* Node Type */}
2919 <div>
2920 <h3 className="text-sm font-semibold text-gray-400 uppercase mb-1">Type</h3>
2921 <span className="inline-block px-3 py-1 bg-blue-600 rounded-full text-sm">
2922 {node.type}
2923 </span>
2924 </div>
2925
2926 {/* Confidence */}
2927 {node.confidence && (
2928 <div>
2929 <h3 className="text-sm font-semibold text-gray-400 uppercase mb-1">Confidence</h3>
2930 <div className="flex items-center">
2931 <div className="flex-1 bg-gray-700 rounded-full h-2 mr-2">
2932 <div
2933 className="bg-green-500 h-2 rounded-full"
2934 style={{ width: `${node.confidence * 100}%` }}
2935 ></div>
2936 </div>
2937 <span className="text-sm">{(node.confidence * 100).toFixed(0)}%</span>
2938 </div>
2939 </div>
2940 )}
2941
2942 {/* Provenance */}
2943 <div>
2944 <h3 className="text-sm font-semibold text-gray-400 uppercase mb-2">
2945 Provenance ({node.provenance?.length || 0} sources)
2946 </h3>
2947 {node.provenance && node.provenance.length > 0 ? (
2948 <div className="space-y-2">
2949 {node.provenance.map((prov, idx) => (
2950 <div key={idx} className="bg-gray-700 p-3 rounded text-sm">
2951 <div className="flex justify-between text-xs text-gray-400 mb-1">
2952 <span>Note ID: {prov[0]}</span>
2953 <span>Span: {prov[1]}-{prov[2]}</span>
2954 </div>
2955 <button
2956 className="text-blue-400 hover:text-blue-300 text-xs"
2957 onClick={() => {
2958 // TODO: Navigate to note or show excerpt
2959 console.log('View note:', prov[0]);
2960 }}
2961 >
2962 View source →
2963 </button>
2964 </div>
2965 ))}
2966 </div>
2967 ) : (
2968 <p className="text-gray-500 text-sm">No provenance data available</p>
2969 )}
2970 </div>
2971
2972 {/* Metadata */}
2973 <div>
2974 <h3 className="text-sm font-semibold text-gray-400 uppercase mb-2">Metadata</h3>
2975 <div className="bg-gray-700 p-3 rounded text-xs space-y-1">
2976 <div className="flex justify-between">
2977 <span className="text-gray-400">ID:</span>
2978 <span className="font-mono">{node.id}</span>
2979 </div>
2980 <div className="flex justify-between">
2981 <span className="text-gray-400">Created:</span>
2982 <span>{new Date(node.created_at).toLocaleString()}</span>
2983 </div>
2984 <div className="flex justify-between">
2985 <span className="text-gray-400">Updated:</span>
2986 <span>{new Date(node.updated_at).toLocaleString()}</span>
2987 </div>
2988 </div>
2989 </div>
2990
2991 {/* Actions */}
2992 <div className="pt-4 border-t border-gray-700">
2993 <button className="w-full bg-blue-600 hover:bg-blue-700 text-white py-2 px-4 rounded mb-2">
2994 Edit Node
2995 </button>
2996 <button className="w-full bg-gray-700 hover:bg-gray-600 text-white py-2 px-4 rounded">
2997 Find Related
2998 </button>
2999 </div>
3000 </div>
3001 </div>
3002 );
3003};
3004
3005export default NodeDetailsPanel;
3006
3007#### 5.3 Graph Page Implementation
3008
30091. Create `frontend/app/graph/page.tsx`:
3010
3011'use client';
3012
3013import React, { useState, useEffect } from 'react';
3014import { useQuery } from '@tanstack/react-query';
3015import GraphCanvas from '@/components/GraphCanvas';
3016import NodeDetailsPanel from '@/components/NodeDetailsPanel';
3017import { graphAPI, GraphData, Node } from '@/lib/api';
3018
3019export default function GraphPage() {
3020 const [selectedNodeId, setSelectedNodeId] = useState<string | null>(null);
3021 const [showPanel, setShowPanel] = useState(false);
3022
3023 const { data: graphData, isLoading, error } = useQuery<GraphData>({
3024 queryKey: ['graph'],
3025 queryFn: () => graphAPI.getGraph(),
3026 refetchInterval: 30000, // Refresh every 30 seconds
3027 });
3028
3029 const handleNodeClick = (node: Node) => {
3030 setSelectedNodeId(node.id);
3031 };
3032
3033 const handleNodeDoubleClick = (node: Node) => {
3034 setSelectedNodeId(node.id);
3035 setShowPanel(true);
3036 };
3037
3038 const handleClosePanel = () => {
3039 setShowPanel(false);
3040 };
3041
3042 if (isLoading) {
3043 return (
3044 <div className="flex items-center justify-center h-screen bg-gray-900">
3045 <div className="text-white text-xl">Loading graph...</div>
3046 </div>
3047 );
3048 }
3049
3050 if (error) {
3051 return (
3052 <div className="flex items-center justify-center h-screen bg-gray-900">
3053 <div className="text-red-400 text-xl">Error loading graph</div>
3054 </div>
3055 );
3056 }
3057
3058 return (
3059 <div className="flex h-screen bg-gray-900">
3060 {/* Main Graph Area */}
3061 <div className="flex-1 relative">
3062 <div className="absolute top-4 left-4 z-10 bg-gray-800 text-white p-4 rounded-lg shadow-lg">
3063 <h1 className="text-xl font-bold mb-2">Mind Map AI</h1>
3064 <div className="text-sm text-gray-400">
3065 <p>Nodes: {graphData?.nodes.length || 0}</p>
3066 <p>Edges: {graphData?.edges.length || 0}</p>
3067 </div>
3068 </div>
3069
3070 <div className="absolute top-4 right-4 z-10 bg-gray-800 text-white p-2 rounded-lg shadow-lg">
3071 <div className="text-xs space-y-1">
3072 <div className="flex items-center">
3073 <div className="w-3 h-3 bg-blue-500 rounded-full mr-2"></div>
3074 <span>Concept</span>
3075 </div>
3076 <div className="flex items-center">
3077 <div className="w-3 h-3 bg-green-500 rounded-full mr-2"></div>
3078 <span>Person</span>
3079 </div>
3080 <div className="flex items-center">
3081 <div className="w-3 h-3 bg-yellow-500 rounded-full mr-2"></div>
3082 <span>Place</span>
3083 </div>
3084 <div className="flex items-center">
3085 <div className="w-3 h-3 bg-purple-500 rounded-full mr-2"></div>
3086 <span>Idea</span>
3087 </div>
3088 <div className="flex items-center">
3089 <div className="w-3 h-3 bg-red-500 rounded-full mr-2"></div>
3090 <span>Event</span>
3091 </div>
3092 </div>
3093 </div>
3094
3095 {graphData && (
3096 <GraphCanvas
3097 data={graphData}
3098 onNodeClick={handleNodeClick}
3099 onNodeDoubleClick={handleNodeDoubleClick}
3100 selectedNodeId={selectedNodeId || undefined}
3101 />
3102 )}
3103 </div>
3104
3105 {/* Side Panel */}
3106 {showPanel && selectedNodeId && (
3107 <div className="border-l border-gray-700">
3108 <NodeDetailsPanel nodeId={selectedNodeId} onClose={handleClosePanel} />
3109 </div>
3110 )}
3111 </div>
3112 );
3113}
3114
31152. Create `frontend/app/layout.tsx`:
3116
3117import './globals.css';
3118import type { Metadata } from 'next';
3119import { Inter } from 'next/font/google';
3120import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
3121import Providers from './providers';
3122
3123const inter = Inter({ subsets: ['latin'] });
3124
3125export const metadata: Metadata = {
3126 title: 'Mind Map AI - Personal Knowledge Graph',
3127 description: 'Local LLM-powered knowledge graph for personal notes',
3128};
3129
3130export default function RootLayout({
3131 children,
3132}: {
3133 children: React.ReactNode;
3134}) {
3135 return (
3136 <html lang="en">
3137 <body className={inter.className}>
3138 <Providers>{children}</Providers>
3139 </body>
3140 </html>
3141 );
3142}
3143
31443. Create `frontend/app/providers.tsx`:
3145
3146'use client';
3147
3148import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
3149import { useState } from 'react';
3150
3151export default function Providers({ children }: { children: React.ReactNode }) {
3152 const [queryClient] = useState(
3153 () =>
3154 new QueryClient({
3155 defaultOptions: {
3156 queries: {
3157 staleTime: 60 * 1000, // 1 minute
3158 refetchOnWindowFocus: false,
3159 },
3160 },
3161 })
3162 );
3163
3164 return (
3165 <QueryClientProvider client={queryClient}>
3166 {children}
3167 </QueryClientProvider>
3168 );
3169}
3170
3171#### 5.4 Testing & Documentation
3172
31731. Update `docs/design_system.md`:
3174
3175# Frontend Design System
3176
3177## Visual Design Principles
3178
3179### Color Palette
3180
3181**Node Colors (by type):**
3182- Concept: `#3b82f6` (Blue)
3183- Person: `#10b981` (Green)
3184- Place: `#f59e0b` (Amber)
3185- Idea: `#8b5cf6` (Purple)
3186- Event: `#ef4444` (Red)
3187- Passage: `#6b7280` (Gray)
3188
3189**UI Colors:**
3190- Background: `#111827` (Gray-900)
3191- Panel: `#1f2937` (Gray-800)
3192- Accent: `#fbbf24` (Yellow-400)
3193- Text Primary: `#ffffff`
3194- Text Secondary: `#9ca3af` (Gray-400)
3195
3196### Visualization Cues
3197
3198**Node Size:**
3199- Based on provenance count (number of source references)
3200- Formula: `min(20 + provenance_count * 5, 60)` pixels
3201- Larger nodes indicate concepts mentioned across multiple notes
3202
3203**Edge Thickness:**
3204- Based on confidence weight (0-1)
3205- Formula: `1 + weight * 3` pixels
3206- Thicker edges indicate stronger relationships
3207
3208**Node Selection:**
3209- Selected nodes have yellow (`#fbbf24`) border, 3px width
3210- Click to select, double-click to open details panel
3211
3212### Layout Algorithm
3213
3214**Graph Layout: COSE (Compound Spring Embedder)**
3215- Organic, force-directed layout
3216- Parameters:
3217 - Node repulsion: 8000
3218 - Ideal edge length: 100
3219 - Edge elasticity: 100
3220 - Animation duration: 500ms
3221
3222### Interactions
3223
3224**Primary Interactions:**
32251. **Single Click Node**: Select node, highlight in graph
32262. **Double Click Node**: Open NodeDetailsPanel with provenance
32273. **Pan**: Click and drag on background
32284. **Zoom**: Mouse wheel or pinch gesture
32295. **Hover Node**: Show tooltip with label and type
3230
3231**NodeDetailsPanel:**
3232- Slides in from right side
3233- Shows: Type, confidence, provenance list, metadata
3234- Actions: Edit node, find related nodes, view source notes
3235
3236### Responsive Design
3237
3238**Breakpoints:**
3239- Desktop: > 1024px (full graph + side panel)
3240- Tablet: 768-1024px (graph only, panel as overlay)
3241- Mobile: < 768px (not prioritized in Phase 5)
3242
3243### Accessibility
3244
3245- Keyboard navigation: Tab through nodes
3246- ARIA labels on interactive elements
3247- Sufficient color contrast (WCAG AA)
3248- Screen reader support for node metadata
3249
3250## Component Structure
3251
3252GraphPage
3253├── GraphCanvas (Cytoscape visualization)
3254│ ├── Node rendering
3255│ ├── Edge rendering
3256│ └── Event handlers
3257└── NodeDetailsPanel (Side panel)
3258 ├── Node metadata
3259 ├── Provenance list
3260 └── Action buttons
3261
32622. Update `docs/testing.md` with frontend testing strategy:
3263
3264## Frontend Testing
3265
3266### Component Testing (React Testing Library)
3267
3268Test coverage for:
3269- GraphCanvas render with sample data
3270- NodeDetailsPanel data display
3271- User interactions (click, double-click)
3272- Loading and error states
3273
3274### E2E Testing (Playwright - Future Phase)
3275
3276Critical user flows:
32771. Load graph page → View graph → Click node → View details
32782. Search for node → Select from results → Navigate to graph
32793. Upload note → Wait for processing → Verify graph updated
3280
3281**Deliverables**:
3282- Complete Next.js frontend setup
3283- `GraphCanvas` component with Cytoscape integration
3284- `NodeDetailsPanel` with provenance display
3285- `/graph` page with full visualization
3286- API client library (`lib/api.ts`)
3287- Updated documentation
3288
3289**Completion Threshold**:
3290- [ ] Frontend runs: `npm run dev` on port 3000
3291- [ ] Graph page loads and displays empty state
3292- [ ] Sample graph data (manually added via API) renders correctly
3293- [ ] Node click and double-click handlers work
3294- [ ] NodeDetailsPanel displays node metadata and provenance
3295- [ ] Update `checklist.md` with Phase 5 completion
3296- [ ] Log frontend architecture in `decisions.md`
3297
3298---
3299
3300### Phase 6: Note Upload & Integration Testing
3301
3302**Objective**: Complete note ingestion UI and run end-to-end integration tests.
3303
3304**Pre-requisites**: Phases 1-5 complete
3305
3306**Tasks**:
3307
3308#### 6.1 Note Upload Component
3309
33101. Create `frontend/components/NoteUploader.tsx`:
3311
3312'use client';
3313
3314import React, { useState, useCallback } from 'react';
3315import { useDropzone } from 'react-dropzone';
3316import { CloudArrowUpIcon, DocumentTextIcon } from '@heroicons/react/24/outline';
3317import { ingestAPI } from '@/lib/api';
3318import { useMutation, useQueryClient } from '@tanstack/react-query';
3319
3320const NoteUploader: React.FC = () => {
3321 const [uploadStatus, setUploadStatus] = useState<string>('');
3322 const queryClient = useQueryClient();
3323
3324 const uploadMutation = useMutation({
3325 mutationFn: (file: File) => ingestAPI.ingestFile(file),
3326 onSuccess: (data) => {
3327 setUploadStatus(`Successfully uploaded ${data.note_ids.length} note(s)`);
3328 // Invalidate graph query to trigger refresh
3329 queryClient.invalidateQueries({ queryKey: ['graph'] });
3330 },
3331 onError: (error) => {
3332 setUploadStatus(`Upload failed: ${error}`);
3333 },
3334 });
3335
3336 const onDrop = useCallback((acceptedFiles: File[]) => {
3337 if (acceptedFiles.length > 0) {
3338 const file = acceptedFiles[0];
3339 setUploadStatus(`Uploading ${file.name}...`);
3340 uploadMutation.mutate(file);
3341 }
3342 }, [uploadMutation]);
3343
3344 const { getRootProps, getInputProps, isDragActive } = useDropzone({
3345 onDrop,
3346 accept: {
3347 'text/markdown': ['.md'],
3348 'text/plain': ['.txt'],
3349 'application/zip': ['.zip'],
3350 },
3351 multiple: false,
3352 });
3353
3354 return (
3355 <div className="w-full max-w-2xl mx-auto p-6">
3356 <div
3357 {...getRootProps()}
3358 className={`border-2 border-dashed rounded-lg p-12 text-center cursor-pointer transition-colors ${
3359 isDragActive
3360 ? 'border-blue-500 bg-blue-50'
3361 : 'border-gray-300 hover:border-gray-400'
3362 }`}
3363 >
3364 <input {...getInputProps()} />
3365
3366 <CloudArrowUpIcon className="w-16 h-16 mx-auto mb-4 text-gray-400" />
3367
3368 {isDragActive ? (
3369 <p className="text-lg text-blue-600">Drop the file here...</p>
3370 ) : (
3371 <div>
3372 <p className="text-lg text-gray-700 mb-2">
3373 Drag & drop a markdown file or zip archive here
3374 </p>
3375 <p className="text-sm text-gray-500">
3376 or click to select file
3377 </p>
3378 <p className="text-xs text-gray-400 mt-4">
3379 Supported: .md, .txt, .zip
3380 </p>
3381 </div>
3382 )}
3383 </div>
3384
3385 {uploadStatus && (
3386 <div className="mt-4 p-4 bg-gray-100 rounded-lg">
3387 <p className="text-sm text-gray-700">{uploadStatus}</p>
3388 </div>
3389 )}
3390
3391 {uploadMutation.isLoading && (
3392 <div className="mt-4">
3393 <div className="animate-pulse flex items-center">
3394 <DocumentTextIcon className="w-5 h-5 mr-2 text-blue-500" />
3395 <span className="text-sm text-gray-600">Processing...</span>
3396 </div>
3397 </div>
3398 )}
3399 </div>
3400 );
3401};
3402
3403export default NoteUploader;
3404
34052. Create `frontend/app/page.tsx` (Dashboard):
3406
3407'use client';
3408
3409import React from 'react';
3410import { useQuery } from '@tanstack/react-query';
3411import { graphAPI } from '@/lib/api';
3412import NoteUploader from '@/components/NoteUploader';
3413import Link from 'next/link';
3414
3415export default function HomePage() {
3416 const { data: stats } = useQuery({
3417 queryKey: ['graph-stats'],
3418 queryFn: () => graphAPI.getStats(),
3419 });
3420
3421 return (
3422 <div className="min-h-screen bg-gray-50">
3423 <header className="bg-white shadow-sm">
3424 <div className="max-w-7xl mx-auto px-4 py-4 sm:px-6 lg:px-8">
3425 <h1 className="text-3xl font-bold text-gray-900">Mind Map AI</h1>
3426 <p className="text-sm text-gray-600 mt-1">
3427 Your personal knowledge graph, powered by local LLM
3428 </p>
3429 </div>
3430 </header>
3431
3432 <main className="max-w-7xl mx-auto px-4 py-8 sm:px-6 lg:px-8">
3433 {/* Stats */}
3434 <div className="grid grid-cols-1 md:grid-cols-3 gap-6 mb-8">
3435 <div className="bg-white p-6 rounded-lg shadow">
3436 <h3 className="text-sm font-medium text-gray-500 uppercase">Nodes</h3>
3437 <p className="text-3xl font-bold text-gray-900 mt-2">
3438 {stats?.num_nodes || 0}
3439 </p>
3440 </div>
3441 <div className="bg-white p-6 rounded-lg shadow">
3442 <h3 className="text-sm font-medium text-gray-500 uppercase">Edges</h3>
3443 <p className="text-3xl font-bold text-gray-900 mt-2">
3444 {stats?.num_edges || 0}
3445 </p>
3446 </div>
3447 <div className="bg-white p-6 rounded-lg shadow">
3448 <h3 className="text-sm font-medium text-gray-500 uppercase">Density</h3>
3449 <p className="text-3xl font-bold text-gray-900 mt-2">
3450 {stats?.density?.toFixed(3) || '0.000'}
3451 </p>
3452 </div>
3453 </div>
3454
3455 {/* Upload Section */}
3456 <div className="bg-white p-8 rounded-lg shadow mb-8">
3457 <h2 className="text-2xl font-bold text-gray-900 mb-4">
3458 Upload Notes
3459 </h2>
3460 <NoteUploader />
3461 </div>
3462
3463 {/* Quick Actions */}
3464 <div className="grid grid-cols-1 md:grid-cols-2 gap-6">
3465 <Link
3466 href="/graph"
3467 className="block p-6 bg-blue-600 text-white rounded-lg shadow hover:bg-blue-700 transition"
3468 >
3469 <h3 className="text-xl font-bold mb-2">Explore Graph</h3>
3470 <p className="text-blue-100">
3471 Visualize and interact with your knowledge graph
3472 </p>
3473 </Link>
3474
3475 <Link
3476 href="/search"
3477 className="block p-6 bg-purple-600 text-white rounded-lg shadow hover:bg-purple-700 transition"
3478 >
3479 <h3 className="text-xl font-bold mb-2">Semantic Search</h3>
3480 <p className="text-purple-100">
3481 Find related concepts and notes
3482 </p>
3483 </Link>
3484 </div>
3485 </main>
3486 </div>
3487 );
3488}
3489
34903. Install additional dependency:
3491
3492cd frontend
3493npm install react-dropzone
3494
3495#### 6.2 Integration Testing
3496
34971. Create sample test data in `data/notes/`:
3498
3499mkdir -p data/notes
3500
35012. Create `data/notes/sample1.md`:
3502
3503# Daily Journal - January 15, 2024
3504
3505I've been thinking a lot about productivity and how sleep affects my work. When I don't get enough rest, my focus drops significantly. I've noticed that exercise helps improve both my sleep quality and energy levels during the day.
3506
3507Key takeaways:
3508- Better sleep leads to better productivity
3509- Regular exercise improves sleep
3510- Morning routines set the tone for the entire day
3511
35123. Create `data/notes/sample2.md`:
3513
3514# Artificial Intelligence Research Notes
3515
3516Machine learning and deep learning are transforming software development. Neural networks can now generate code, detect bugs, and optimize performance. The recent advances in large language models like GPT and Claude have made AI assistants incredibly useful for developers.
3517
3518Important concepts:
3519- Neural networks process information in layers
3520- Transformers use attention mechanisms
3521- Fine-tuning adapts models to specific tasks
3522
35234. Create `data/notes/sample3.md`:
3524
3525# Project Planning - Mind Map AI
3526
3527Building a local knowledge graph system that extracts entities and relationships from personal notes. The system uses NetworkX for graph storage and a local LLM for extraction.
3528
3529Technical decisions:
3530- FastAPI for backend REST API
3531- SQLite for provenance tracking
3532- Cytoscape.js for visualization
3533- Sentence transformers for semantic search
3534
3535The goal is complete local operation with no cloud dependencies.
3536
35375. Create `tests/integration/test_full_pipeline.py`:
3538
3539import pytest
3540import requests
3541import time
3542from pathlib import Path
3543
3544API_BASE = "http://localhost:8000"
3545
3546def test_health_check():
3547 """Test API health endpoint."""
3548 response = requests.get(f"{API_BASE}/health")
3549 assert response.status_code == 200
3550 assert response.json()["status"] == "healthy"
3551
3552def test_ingestion_pipeline():
3553 """
3554 Integration test: Ingest sample notes and verify graph creation.
3555
3556 This test validates the complete pipeline:
3557 1. Upload markdown file
3558 2. Wait for processing
3559 3. Verify nodes and edges created
3560 4. Check graph statistics
3561 """
3562 # Read sample note
3563 sample_path = Path(__file__).parent.parent.parent / "data" / "notes" / "sample1.md"
3564
3565 with open(sample_path, 'r') as f:
3566 content = f.read()
3567
3568 # Ingest text
3569 response = requests.post(
3570 f"{API_BASE}/api/ingest/text",
3571 json={
3572 "filename": "sample1.md",
3573 "content": content
3574 }
3575 )
3576
3577 assert response.status_code == 200
3578 data = response.json()
3579 note_id = data["note_id"]
3580
3581 # Poll for processing completion
3582 max_attempts = 30
3583 for attempt in range(max_attempts):
3584 status_response = requests.get(f"{API_BASE}/api/ingest/status/{note_id}")
3585 status_data = status_response.json()
3586
3587 if status_data["processed"]:
3588 break
3589
3590 time.sleep(2)
3591 else:
3592 pytest.fail("Processing timed out after 60 seconds")
3593
3594 # Verify graph updated
3595 graph_response = requests.get(f"{API_BASE}/api/graph")
3596 assert graph_response.status_code == 200
3597 graph_data = graph_response.json()
3598
3599 assert len(graph_data["nodes"]) > 0, "No nodes created from extraction"
3600 assert len(graph_data["edges"]) >= 0, "Graph should have edges or be valid without them"
3601
3602 # Verify node types
3603 node_types = [node["type"] for node in graph_data["nodes"]]
3604 valid_types = {"concept", "person", "place", "idea", "event", "passage"}
3605 assert all(t in valid_types for t in node_types), f"Invalid node types: {node_types}"
3606
3607 # Verify provenance exists
3608 for node in graph_data["nodes"]:
3609 assert "provenance" in node, f"Node {node['id']} missing provenance"
3610 assert len(node["provenance"]) > 0, f"Node {node['id']} has empty provenance"
3611
3612def test_semantic_search():
3613 """Test semantic search functionality."""
3614 # Ensure some data exists
3615 graph_response = requests.get(f"{API_BASE}/api/graph")
3616 graph_data = graph_response.json()
3617
3618 if len(graph_data["nodes"]) == 0:
3619 pytest.skip("No graph data available for search test")
3620
3621 # Perform search
3622 search_response = requests.post(
3623 f"{API_BASE}/api/search/semantic",
3624 json={
3625 "q": "productivity and sleep",
3626 "top_k": 5,
3627 "search_type": "both"
3628 }
3629 )
3630
3631 assert search_response.status_code == 200
3632 search_data = search_response.json()
3633
3634 assert "results" in search_data
3635 assert isinstance(search_data["results"], list)
3636
3637 # Verify result structure
3638 for result in search_data["results"]:
3639 assert "type" in result
3640 assert result["type"] in ["note", "node"]
3641 assert "score" in result
3642 assert 0 <= result["score"] <= 1
3643
3644def test_graph_export():
3645 """Test graph export functionality."""
3646 # Export as GraphML
3647 export_response = requests.get(f"{API_BASE}/api/export?format=graphml")
3648 assert export_response.status_code == 200
3649 assert len(export_response.content) > 0
3650
3651 # Verify GraphML content
3652 content = export_response.content.decode('utf-8')
3653 assert '<?xml' in content
3654 assert '<graphml' in content
3655
3656def test_full_batch_ingestion():
3657 """
3658 Test batch ingestion of all sample notes.
3659
3660 This is the acceptance test from Phase 2.
3661 """
3662 notes_dir = Path(__file__).parent.parent.parent / "data" / "notes"
3663
3664 if not notes_dir.exists():
3665 pytest.skip("Sample notes directory not found")
3666
3667 note_ids = []
3668
3669 # Ingest all markdown files
3670 for md_file in notes_dir.glob("*.md"):
3671 with open(md_file, 'r') as f:
3672 content = f.read()
3673
3674 response = requests.post(
3675 f"{API_BASE}/api/ingest/text",
3676 json={
3677 "filename": md_file.name,
3678 "content": content
3679 }
3680 )
3681
3682 assert response.status_code == 200
3683 note_ids.append(response.json()["note_id"])
3684
3685 # Wait for all processing to complete
3686 max_wait = 120 # 2 minutes
3687 start_time = time.time()
3688
3689 while time.time() - start_time < max_wait:
3690 all_processed = True
3691
3692 for note_id in note_ids:
3693 status_response = requests.get(f"{API_BASE}/api/ingest/status/{note_id}")
3694 if not status_response.json()["processed"]:
3695 all_processed = False
3696 break
3697
3698 if all_processed:
3699 break
3700
3701 time.sleep(3)
3702 else:
3703 pytest.fail("Batch processing timed out")
3704
3705 # Get final graph stats
3706 stats_response = requests.get(f"{API_BASE}/api/graph/stats")
3707 stats = stats_response.json()
3708
3709 # Acceptance criteria
3710 assert stats["num_nodes"] > 0, "No nodes created from sample notes"
3711 assert stats["num_edges"] >= 0, "Invalid edge count"
3712
3713 print(f"\n✓ Successfully ingested {len(note_ids)} notes")
3714 print(f"✓ Created {stats['num_nodes']} nodes")
3715 print(f"✓ Created {stats['num_edges']} edges")
3716
3717 # Export and verify provenance
3718 export_response = requests.get(f"{API_BASE}/api/export?format=graphml")
3719 assert export_response.status_code == 200
3720
3721 export_content = export_response.content.decode('utf-8')
3722 assert 'provenance' in export_content, "Exported graph missing provenance data"
3723
3724 print("✓ Exported graph contains provenance data")
3725
37266. Create pytest configuration `pytest.ini` in project root:
3727
3728[pytest]
3729testpaths = tests
3730python_files = test_*.py
3731python_classes = Test*
3732python_functions = test_*
3733addopts = -v --tb=short
3734
3735#### 6.3 Run Integration Tests
3736
37371. Update `docs/testing.md` with integration test instructions:
3738
3739## Integration Testing
3740
3741### Setup
3742
37431. Ensure backend is running:
3744cd backend
3745source .venv/bin/activate
3746uvicorn app.main:app --reload
3747
37482. Ensure local LLM is running (Ollama):
3749ollama serve
3750
37513. Run integration tests:
3752pytest tests/integration/test_full_pipeline.py -v
3753
3754### Acceptance Tests
3755
3756#### Test 1: Sample Notes Ingestion
3757
3758**Objective**: Verify complete pipeline from ingestion to graph creation.
3759
3760**Steps**:
37611. Ingest all files from `data/notes/`
37622. Wait for processing completion
37633. Verify graph contains nodes (N > 0) and edges (M ≥ 0)
3764
3765**Success Criteria**:
3766- All notes marked as processed
3767- Graph contains extracted nodes
3768- Each node has provenance data
3769
3770#### Test 2: Provenance Verification
3771
3772**Objective**: Ensure exported graph contains full provenance.
3773
3774**Steps**:
37751. Export graph as GraphML
37762. Parse and verify structure
37773. Check for provenance attributes on nodes
3778
3779**Success Criteria**:
3780- Export completes successfully
3781- GraphML contains valid XML
3782- At least one node has provenance attribute with source reference
3783
3784### Manual Testing Checklist
3785
3786- [ ] Upload single markdown file via frontend
3787- [ ] Verify note appears in database: `sqlite3 data/mindmap.db "SELECT * FROM notes;"`
3788- [ ] Verify extraction in database: `sqlite3 data/mindmap.db "SELECT * FROM extracts;"`
3789- [ ] Navigate to `/graph` page and verify visualization
3790- [ ] Click node and verify details panel opens
3791- [ ] Perform semantic search and verify results
3792- [ ] Export graph and verify file downloads
3793
3794**Deliverables**:
3795- `NoteUploader` component with drag-and-drop
3796- Dashboard page with stats and upload UI
3797- Sample test data in `data/notes/`
3798- Integration test suite in `tests/integration/`
3799- Updated testing documentation
3800
3801**Completion Threshold**:
3802- [ ] Frontend upload UI functional
3803- [ ] Sample notes can be uploaded via UI
3804- [ ] Integration tests pass: `pytest tests/integration/test_full_pipeline.py`
3805- [ ] **Acceptance Test 1**: Ingest sample notes → N nodes and M edges created
3806- [ ] **Acceptance Test 2**: Export graph → Contains provenance data
3807- [ ] Manual verification: Upload note → See graph update in real-time
3808- [ ] Update `checklist.md` with Phase 6 completion
3809- [ ] Log integration testing results in `decisions.md`
3810
3811---
3812
3813### Phase 7: Security & Deployment
3814
3815**Objective**: Apply security best practices and prepare for deployment.
3816
3817**Pre-requisites**: Phases 1-6 complete
3818
3819**Tasks**:
3820
3821#### 7.1 Security Implementation
3822
38231. Update `backend/app/config.py` with security settings:
3824
3825from pydantic_settings import BaseSettings
3826from pathlib import Path
3827
3828class Settings(BaseSettings):
3829 # LLM Configuration
3830 llm_endpoint: str = "http://localhost:11434/api/generate"
3831 llm_model: str = "llama3"
3832 embedding_endpoint: str = "http://localhost:11434/api/embeddings"
3833 embedding_model: str = "all-minilm"
3834
3835 # Database Paths
3836 db_path: Path = Path(__file__).parent.parent.parent / "data" / "mindmap.db"
3837 graph_path: Path = Path(__file__).parent.parent.parent / "data" / "graph.gpickle"
3838 vector_db_path: Path = Path(__file__).parent.parent.parent / "data" / "vectors"
3839
3840 # API Configuration
3841 api_host: str = "0.0.0.0"
3842 api_port: int = 8000
3843 cors_origins: list = ["http://localhost:3000"]
3844
3845 # Security
3846 max_upload_size: int = 10 * 1024 * 1024 # 10MB
3847 allowed_extensions: set = {".md", ".txt"}
3848 disable_external_llm: bool = True # Force local-only operation
3849
3850 # Processing Configuration
3851 max_batch_size: int = 10
3852 extraction_timeout: int = 300
3853
3854 class Config:
3855 env_file = ".env"
3856
3857settings = Settings()
3858
38592. Add input validation to ingestion endpoints in `backend/app/api/ingest.py`:
3860
3861from fastapi import APIRouter, UploadFile, File, HTTPException, BackgroundTasks
3862from pydantic import BaseModel, validator
3863from typing import List
3864from ..db.db import insert_note
3865from ..services.extractor import process_note
3866from ..config import settings
3867import zipfile
3868import io
3869
3870router = APIRouter()
3871
3872class IngestTextRequest(BaseModel):
3873 filename: str
3874 content: str
3875 source_path: str = None
3876
3877 @validator('filename')
3878 def validate_filename(cls, v):
3879 """Validate filename extension."""
3880 if not any(v.endswith(ext) for ext in settings.allowed_extensions):
3881 raise ValueError(f"Invalid file extension. Allowed: {settings.allowed_extensions}")
3882 return v
3883
3884 @validator('content')
3885 def validate_content_length(cls, v):
3886 """Validate content size."""
3887 if len(v.encode('utf-8')) > settings.max_upload_size:
3888 raise ValueError(f"Content exceeds maximum size of {settings.max_upload_size} bytes")
3889 return v
3890
3891# ... rest of the endpoints remain the same but with validation
3892
38933. Add rate limiting middleware in `backend/app/main.py`:
3894
3895from fastapi import FastAPI, Request
3896from fastapi.middleware.cors import CORSMiddleware
3897from slowapi import Limiter, _rate_limit_exceeded_handler
3898from slowapi.util import get_remote_address
3899from slowapi.errors import RateLimitExceeded
3900from .config import settings
3901from .db.db import init_database
3902from .api import ingest, graph, search
3903
3904limiter = Limiter(key_func=get_remote_address)
3905app = FastAPI(
3906 title="Mind Map AI",
3907 description="Local LLM-powered personal knowledge graph",
3908 version="0.1.0"
3909)
3910app.state.limiter = limiter
3911app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
3912
3913# CORS middleware
3914app.add_middleware(
3915 CORSMiddleware,
3916 allow_origins=settings.cors_origins,
3917 allow_credentials=True,
3918 allow_methods=["*"],
3919 allow_headers=["*"],
3920)
3921
3922# Initialize database on startup
3923@app.on_event("startup")
3924async def startup_event():
3925 init_database()
3926 from .services.graph_store import init_graph
3927 from .services.embeddings import init_embeddings
3928 init_graph()
3929 init_embeddings()
3930
3931# Include routers
3932app.include_router(ingest.router, prefix="/api/ingest", tags=["ingestion"])
3933app.include_router(graph.router, prefix="/api/graph", tags=["graph"])
3934app.include_router(search.router, prefix="/api/search", tags=["search"])
3935
3936@app.get("/")
3937async def root():
3938 return {"message": "Mind Map AI API", "version": "0.1.0"}
3939
3940@app.get("/health")
3941@limiter.limit("10/minute")
3942async def health_check(request: Request):
3943 return {"status": "healthy"}
3944
39454. Install security dependency:
3946
3947cd backend
3948pip install slowapi
3949pip freeze > requirements.txt
3950
39515. Update `docs/security.md`:
3952
3953# Security Best Practices
3954
3955## Local-Only Architecture
3956
3957**Critical Constraint**: The system operates entirely locally by default.
3958
3959### Configuration
3960
3961- `DISABLE_EXTERNAL_LLM=true` prevents any external LLM API calls
3962- LLM endpoint must be localhost or explicitly whitelisted
3963- All data (notes, graph, vectors) stored locally in `data/` directory
3964
3965### Input Validation
3966
3967**File Upload:**
3968- Maximum size: 10MB (configurable via `MAX_UPLOAD_SIZE`)
3969- Allowed extensions: `.md`, `.txt`, `.zip`
3970- Filename sanitization prevents path traversal
3971
3972**Text Ingestion:**
3973- Content size validation
3974- UTF-8 encoding enforcement
3975- SQL injection prevention via parameterized queries
3976
3977### Rate Limiting
3978
3979- Health endpoint: 10 requests/minute per IP
3980- Ingestion endpoints: 5 requests/minute per IP
3981- Search endpoints: 20 requests/minute per IP
3982
3983### Data Security
3984
3985**SQLite Database:**
3986- File permissions: 600 (owner read/write only)
3987- No remote access
3988- Regular backups recommended
3989
3990**Graph & Vector Store:**
3991- Persistent files in `data/` directory
3992- No network exposure
3993- Access controlled via filesystem permissions
3994
3995### API Security
3996
3997**CORS:**
3998- Restricted to `http://localhost:3000` by default
3999- Configure `CORS_ORIGINS` for additional allowed origins
4000
4001**Headers:**
4002- No sensitive data in headers
4003- Standard security headers applied
4004
4005### Threat Model
4006
4007**In Scope:**
4008- Local file access control
4009- Input validation and sanitization
4010- Resource exhaustion (rate limiting)
4011
4012**Out of Scope:**
4013- Authentication (single-user system)
4014- Network-based attacks (local-only)
4015- Encryption at rest (relies on OS-level encryption)
4016
4017### Recommended Deployment Practices
4018
40191. Run backend and frontend on localhost only
40202. Use OS-level firewall to block external access
40213. Enable disk encryption for `data/` directory
40224. Regularly backup graph and database files
40235. Keep dependencies updated for security patches
4024
4025### Security Checklist
4026
4027- [ ] `DISABLE_EXTERNAL_LLM=true` in configuration
4028- [ ] File upload size limits enforced
4029- [ ] Rate limiting active on all endpoints
4030- [ ] CORS restricted to known origins
4031- [ ] Database file permissions set to 600
4032- [ ] No sensitive data logged
4033- [ ] Dependencies scanned for vulnerabilities
4034
4035#### 7.2 Docker Configuration
4036
40371. Create `backend/Dockerfile`:
4038
4039FROM python:3.11-slim
4040
4041WORKDIR /app
4042
4043# Install system dependencies
4044RUN apt-get update && apt-get install -y \
4045 build-essential \
4046 && rm -rf /var/lib/apt/lists/*
4047
4048# Copy requirements and install
4049COPY requirements.txt .
4050RUN pip install --no-cache-dir -r requirements.txt
4051
4052# Copy application code
4053COPY app/ ./app/
4054
4055# Create data directory
4056RUN mkdir -p /data
4057
4058# Set environment variables
4059ENV PYTHONUNBUFFERED=1
4060ENV DB_PATH=/data/mindmap.db
4061ENV GRAPH_PATH=/data/graph.gpickle
4062ENV VECTOR_DB_PATH=/data/vectors
4063
4064# Expose port
4065EXPOSE 8000
4066
4067# Run application
4068CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
4069
40702. Create `frontend/Dockerfile`:
4071
4072FROM node:18-alpine AS builder
4073
4074WORKDIR /app
4075
4076# Copy package files
4077COPY package*.json ./
4078
4079# Install dependencies
4080RUN npm ci
4081
4082# Copy source code
4083COPY . .
4084
4085# Build application
4086RUN npm run build
4087
4088# Production image
4089FROM node:18-alpine
4090
4091WORKDIR /app
4092
4093# Copy built assets
4094COPY --from=builder /app/.next ./.next
4095COPY --from=builder /app/public ./public
4096COPY --from=builder /app/package*.json ./
4097COPY --from=builder /app/node_modules ./node_modules
4098
4099# Expose port
4100EXPOSE 3000
4101
4102# Run application
4103CMD ["npm", "start"]
4104
41053. Create `docker-compose.yml` in project root:
4106
4107version: '3.8'
4108
4109services:
4110 backend:
4111 build: ./backend
4112 ports:
4113 - "8000:8000"
4114 volumes:
4115 - ./data:/data
4116 environment:
4117 - LLM_ENDPOINT=http://host.docker.internal:11434/api/generate
4118 - DB_PATH=/data/mindmap.db
4119 - GRAPH_PATH=/data/graph.gpickle
4120 - VECTOR_DB_PATH=/data/vectors
4121 networks:
4122 - mindmap
4123
4124 frontend:
4125 build: ./frontend
4126 ports:
4127 - "3000:3000"
4128 environment:
4129 - NEXT_PUBLIC_API_URL=http://localhost:8000
4130 depends_on:
4131 - backend
4132 networks:
4133 - mindmap
4134
4135networks:
4136 mindmap:
4137 driver: bridge
4138
4139volumes:
4140 data:
4141
41424. Create `.dockerignore` in backend and frontend:
4143
4144**backend/.dockerignore**:
4145__pycache__/
4146*.pyc
4147.venv/
4148.env
4149*.db
4150*.gpickle
4151vectors/
4152
4153**frontend/.dockerignore**:
4154node_modules/
4155.next/
4156.env.local
4157
41585. Update `docs/cicd_devops.md`:
4159
4160# CI/CD & DevOps
4161
4162## Local Development Setup
4163
4164### Prerequisites
4165
4166- Python 3.10+
4167- Node.js 18+
4168- Ollama (or alternative local LLM runtime)
4169
4170### Backend Setup
4171
4172cd backend
4173python -m venv .venv
4174source .venv/bin/activate # On Windows: .venv\Scripts\activate
4175pip install -r requirements.txt
4176uvicorn app.main:app --reload --port 8000
4177
4178### Frontend Setup
4179
4180cd frontend
4181npm install
4182npm run dev
4183
4184### LLM Setup (Ollama)
4185
4186# Install Ollama
4187curl -fsSL https://ollama.com/install.sh | sh
4188
4189# Pull model
4190ollama pull llama3
4191
4192# Start server
4193ollama serve
4194
4195## Docker Deployment
4196
4197### Build and Run with Docker Compose
4198
4199# Build images
4200docker-compose build
4201
4202# Start services
4203docker-compose up -d
4204
4205# View logs
4206docker-compose logs -f
4207
4208# Stop services
4209docker-compose down
4210
4211### Individual Service Management
4212
4213# Backend only
4214docker build -t mindmap-backend ./backend
4215docker run -p 8000:8000 -v $(pwd)/data:/data mindmap-backend
4216
4217# Frontend only
4218docker build -t mindmap-frontend ./frontend
4219docker run -p 3000:3000 mindmap-frontend
4220
4221## Environment Variables
4222
4223Create `.env` file in backend directory:
4224
4225LLM_ENDPOINT=http://localhost:11434/api/generate
4226LLM_MODEL=llama3
4227EMBEDDING_MODEL=all-minilm
4228MAX_UPLOAD_SIZE=10485760
4229EXTRACTION_TIMEOUT=300
4230CORS_ORIGINS=["http://localhost:3000"]
4231
4232## Production Considerations
4233
4234### Performance
4235
4236- Use production ASGI server (Gunicorn with Uvicorn workers)
4237- Enable Next.js production build
4238- Configure proper logging
4239- Monitor resource usage
4240
4241### Backup Strategy
4242
4243# Backup data directory
4244tar -czf backup-$(date +%Y%m%d).tar.gz data/
4245
4246# Automated backup (crontab)
42470 2 * * * tar -czf /backups/mindmap-$(date +\%Y\%m\%d).tar.gz /path/to/data/
4248
4249### Monitoring
4250
4251- Health check endpoint: `GET /health`
4252- Graph stats: `GET /api/graph/stats`
4253- Log aggregation (stdout/stderr)
4254
4255## Deployment Checklist
4256
4257- [ ] Environment variables configured
4258- [ ] Data directory persistent volume mounted
4259- [ ] Local LLM accessible from backend
4260- [ ] CORS origins properly set
4261- [ ] Rate limiting enabled
4262- [ ] Backup strategy implemented
4263- [ ] Health checks configured
4264- [ ] Logging configured
4265
4266**Deliverables**:
4267- Security configuration and input validation
4268- Rate limiting implementation
4269- Dockerfiles for backend and frontend
4270- Docker Compose configuration
4271- Updated security and deployment documentation
4272
4273**Completion Threshold**:
4274- [ ] Input validation prevents oversized uploads
4275- [ ] Rate limiting blocks excessive requests
4276- [ ] Local-only constraint enforced (`DISABLE_EXTERNAL_LLM`)
4277- [ ] Docker images build successfully
4278- [ ] `docker-compose up` starts full stack
4279- [ ] Security audit passes (no external network calls)
4280- [ ] Update `checklist.md` with Phase 7 completion
4281- [ ] Log security measures in `decisions.md`
4282
4283---
4284
4285## Final Checklist & Validation
4286
4287### Complete System Acceptance Test
4288
4289Run this final validation before considering the project complete:
4290
42911. **Environment Setup**:
4292 - [ ] Ollama running with llama3 model
4293 - [ ] Backend running on port 8000
4294 - [ ] Frontend running on port 3000
4295
42962. **Core Functionality**:
4297 - [ ] Upload `data/notes/sample1.md` via frontend
4298 - [ ] Wait for processing (check `/api/ingest/status`)
4299 - [ ] Navigate to `/graph` page
4300 - [ ] Verify graph visualization renders
4301 - [ ] Click a node and verify details panel opens
4302 - [ ] Verify provenance is displayed
4303
43043. **Search Functionality**:
4305 - [ ] Navigate to `/search` page (if implemented)
4306 - [ ] Perform semantic search
4307 - [ ] Verify results are returned and ranked
4308
43094. **Data Persistence**:
4310 - [ ] Stop backend
4311 - [ ] Restart backend
4312 - [ ] Verify graph data persists
4313 - [ ] Verify can query existing nodes
4314
43155. **Export**:
4316 - [ ] Export graph as GraphML
4317 - [ ] Verify file downloads
4318 - [ ] Open in text editor and verify provenance data present
4319
4320### Documentation Completeness
4321
4322Verify all documentation files are complete:
4323
4324- [ ] `docs/architecture.md` - System overview and diagrams
4325- [ ] `docs/api-spec.md` - All endpoints documented with examples
4326- [ ] `docs/database.md` - Schema and graph model documented
4327- [ ] `docs/llm_prompting.md` - Extraction prompts and examples
4328- [ ] `docs/security.md` - Security measures documented
4329- [ ] `docs/cicd_devops.md` - Setup and deployment instructions
4330- [ ] `docs/testing.md` - Test strategy and instructions
4331- [ ] `docs/design_system.md` - UI/UX patterns documented
4332- [ ] `docs/roadmap.md` - Future features listed
4333- [ ] `docs/decisions.md` - Key decisions logged
4334- [ ] `docs/changelog.md` - Version history maintained
4335
4336### Code Quality
4337
4338- [ ] All unit tests pass: `pytest tests/backend/`
4339- [ ] Integration tests pass: `pytest tests/integration/`
4340- [ ] No TODO comments in production code
4341- [ ] All functions have docstrings
4342- [ ] Code follows PEP 8 (Python) and consistent JS style
4343
4344### README Completeness
4345
4346Ensure `README.md` contains:
4347
4348- [ ] Project description
4349- [ ] Features list
4350- [ ] Installation instructions
4351- [ ] Quick start guide
4352- [ ] Usage examples
4353- [ ] Architecture overview
4354- [ ] Contributing guidelines (if applicable)
4355- [ ] License information
4356
4357---
4358
4359## Post-Development: Knowledge Capture
4360
4361After completing all phases, capture the development experience:
4362
43631. **Update `docs/decisions.md`** with:
4364 - Final architectural decisions
4365 - Trade-offs made
4366 - Lessons learned
4367 - Known limitations
4368
43692. **Create blog post outline** covering:
4370 - Project motivation
4371 - Technology choices
4372 - LLM integration challenges
4373 - Graph visualization approach
4374 - Local-first philosophy
4375 - Future enhancements
4376
43773. **Document common issues** in README:
4378 - LLM connection problems
4379 - Graph visualization performance
4380 - Extraction quality tuning
4381
4382---
4383
4384## Maintenance & Evolution
4385
4386### Regular Maintenance Tasks
4387
4388- Update dependencies monthly
4389- Review and improve extraction prompts
4390- Monitor graph growth and performance
4391- Backup data directory weekly
4392
4393### Future Enhancement Priorities
4394
4395Reference `docs/roadmap.md` for planned features. Priority order:
4396
43971. **Graph Analytics Dashboard**: Centrality metrics, community detection
43982. **Advanced Search**: Filters, boolean operators, temporal queries
43993. **Note Versioning**: Track changes to nodes/edges over time
44004. **Export Formats**: JSON, CSV, Obsidian-compatible markdown
44015. **UI Enhancements**: Dark mode, custom node colors, layout algorithms
44026. **Multi-user Support**: Authentication, personal graph spaces (optional)
4403
4404---
4405
4406## Success Criteria Summary
4407
4408The Mind Map AI project is complete when:
4409
4410✅ All 7 phases are marked complete in `checklist.md`
4411✅ Acceptance Test 1 passes: Sample notes → N nodes, M edges
4412✅ Acceptance Test 2 passes: Export contains provenance data
4413✅ All documentation files are comprehensive and accurate
4414✅ Local-only constraint is enforced
4415✅ System runs via Docker Compose
4416✅ Frontend and backend integration is seamless
4417✅ Graph visualization is interactive and performant
4418✅ Semantic search returns relevant results
4419
4420---
4421
4422**End of CLIne Master Prompt**

Image


Vibe Coding


Ok so now it is time to YOLO it and save that prompt as final_propmt.md in root and I just copy pasted the whole thing into the chat for CLIne since it is a large context model. It of course restarts the task with the proper context, which is why I do that since it is pretty good at figuring that part out. The initial prompt it created from that context is as follows:


text
1## Current Work
2The user has provided a comprehensive master prompt for building Mind Map AI, a local LLM-powered personal knowledge graph system. This is a complex multi-phase project that needs to be implemented systematically.
3
4## Key Technical Concepts
5- **Backend**: FastAPI with Python for REST API, graph management, and LLM integration
6- **Graph Engine**: NetworkX for in-memory graph with disk persistence (.gpickle or GraphML)
7- **Database**: SQLite for raw notes, metadata, and provenance tracking
8- **LLM**: Local model (Ollama, Llama.cpp) for entity/relationship extraction
9- **Embeddings**: Local sentence-transformers with Chroma/Faiss for semantic search
10- **Frontend**: Next.js with react-cytoscapejs for interactive graph visualization
11- **Architecture**: Local-only constraint, no external API calls unless configured
12
13## Relevant Files and Code
14The project starts from scratch with only documentation files currently present:
15- cline_master_prompt.md (the master specification)
16- spec.md (project specification)
17- prompt_add.md and final_prompt.md (additional context)
18
19## Problem Solving
20This is a complex multi-phase implementation that requires:
211. Systematic phase-by-phase development
222. Comprehensive documentation before implementation
233. Local-only architecture constraints
244. Integration between multiple technologies (Python backend, Next.js frontend, LLM integration)
25
26## Pending Tasks and Next Steps
27**Phase 0: Setup & Documentation** (Starting now)
281. Create complete directory structure as specified
292. Initialize Git repository with proper .gitignore
303. Generate comprehensive documentation templates in /docs/
314. Create checklist.md for progress tracking
325. Create README.md with project overview
33
34**Subsequent Phases** (After Phase 0 completion):
35- Phase 1: Backend Core Infrastructure (FastAPI, SQLite)
36- Phase 2: NetworkX Graph Store implementation
37- Phase 3: LLM Extraction Module
38- Phase 4: Embeddings & Semantic Search
39- Phase 5: Frontend Setup & Graph Visualization
40- Phase 6: Note Upload & Integration Testing
41- Phase 7: Security & Deployment
42
43The project follows strict local-only constraints and requires comprehensive documentation at each phase.

image


image


image


So the first prompt generated the following repo

This has the documents that outline the entire project. At this point it is not a bad idea to peruse and read the documents created in order to better formulate them.

Now I will start the next prompt, which will begin the real coding and after each prompt remember to stop, read, test and correct.


text
1### Phase 1: Backend Core Infrastructure
2
3**Objective**: Set up FastAPI backend, SQLite database, and basic configuration.
4
5**Pre-requisites**: Phase 0 complete
6
7**Tasks**:
8
9#### 1.1 Database Setup
101. Create `backend/app/db/schema.sql` with the following tables:
11
12
13-- Table: notes
14-- Stores raw markdown/text content with metadata
15CREATE TABLE notes (
16 id INTEGER PRIMARY KEY AUTOINCREMENT,
17 filename TEXT NOT NULL,
18 content TEXT NOT NULL,
19 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
20 updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
21 source_path TEXT,
22 hash TEXT UNIQUE, -- Content hash for deduplication
23 processed BOOLEAN DEFAULT 0 -- Flag for extraction completion
24);
25
26-- Table: extracts
27-- Stores LLM extraction results with provenance
28CREATE TABLE extracts (
29 id INTEGER PRIMARY KEY AUTOINCREMENT,
30 note_id INTEGER NOT NULL REFERENCES notes(id) ON DELETE CASCADE,
31 extractor_model TEXT NOT NULL, -- Model identifier (e.g., "llama3-8b")
32 extract_json TEXT NOT NULL, -- Raw JSON output from LLM
33 score REAL, -- Confidence/quality score
34 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
35 FOREIGN KEY (note_id) REFERENCES notes(id)
36);
37
38-- Table: metadata
39-- Key-value store for system metadata
40CREATE TABLE metadata (
41 key TEXT PRIMARY KEY,
42 value TEXT,
43 updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
44);
45
46-- Indexes for performance
47CREATE INDEX idx_notes_hash ON notes(hash);
48CREATE INDEX idx_notes_processed ON notes(processed);
49CREATE INDEX idx_extracts_note_id ON extracts(note_id);
50
51
522. Create `backend/app/db/db.py` with connection management:
53
54
55import sqlite3
56from pathlib import Path
57from typing import Optional, Dict, List, Any
58import hashlib
59import json
60
61DB_PATH = Path(__file__).parent.parent.parent.parent / "data" / "mindmap.db"
62
63def get_connection() -> sqlite3.Connection:
64 """Get SQLite connection with row factory."""
65 conn = sqlite3.connect(str(DB_PATH))
66 conn.row_factory = sqlite3.Row
67 return conn
68
69def init_database():
70 """Initialize database with schema."""
71 schema_path = Path(__file__).parent / "schema.sql"
72 with open(schema_path) as f:
73 schema = f.read()
74
75 conn = get_connection()
76 conn.executescript(schema)
77 conn.commit()
78 conn.close()
79
80def insert_note(filename: str, content: str, source_path: Optional[str] = None) -> int:
81 """Insert note and return note_id. Skip if hash exists."""
82 content_hash = hashlib.sha256(content.encode()).hexdigest()
83
84 conn = get_connection()
85 cursor = conn.cursor()
86
87 # Check if note with same hash exists
88 cursor.execute("SELECT id FROM notes WHERE hash = ?", (content_hash,))
89 existing = cursor.fetchone()
90
91 if existing:
92 conn.close()
93 return existing[0]
94
95 cursor.execute(
96 "INSERT INTO notes (filename, content, source_path, hash) VALUES (?, ?, ?, ?)",
97 (filename, content, source_path, content_hash)
98 )
99 note_id = cursor.lastrowid
100 conn.commit()
101 conn.close()
102
103 return note_id
104
105def insert_extract(note_id: int, extractor_model: str, extract_json: Dict, score: Optional[float] = None) -> int:
106 """Insert extraction result."""
107 conn = get_connection()
108 cursor = conn.cursor()
109
110 cursor.execute(
111 "INSERT INTO extracts (note_id, extractor_model, extract_json, score) VALUES (?, ?, ?, ?)",
112 (note_id, extractor_model, json.dumps(extract_json), score)
113 )
114 extract_id = cursor.lastrowid
115 conn.commit()
116 conn.close()
117
118 return extract_id
119
120def mark_note_processed(note_id: int):
121 """Mark note as processed after extraction."""
122 conn = get_connection()
123 cursor = conn.cursor()
124 cursor.execute("UPDATE notes SET processed = 1, updated_at = CURRENT_TIMESTAMP WHERE id = ?", (note_id,))
125 conn.commit()
126 conn.close()
127
128def get_note(note_id: int) -> Optional[Dict]:
129 """Retrieve note by ID."""
130 conn = get_connection()
131 cursor = conn.cursor()
132 cursor.execute("SELECT * FROM notes WHERE id = ?", (note_id,))
133 row = cursor.fetchone()
134 conn.close()
135
136 return dict(row) if row else None
137
138def get_all_notes() -> List[Dict]:
139 """Retrieve all notes."""
140 conn = get_connection()
141 cursor = conn.cursor()
142 cursor.execute("SELECT * FROM notes ORDER BY created_at DESC")
143 rows = cursor.fetchall()
144 conn.close()
145
146 return [dict(row) for row in rows]
147
148def get_extracts_for_note(note_id: int) -> List[Dict]:
149 """Retrieve all extracts for a given note."""
150 conn = get_connection()
151 cursor = conn.cursor()
152 cursor.execute("SELECT * FROM extracts WHERE note_id = ? ORDER BY created_at DESC", (note_id,))
153 rows = cursor.fetchall()
154 conn.close()
155
156 return [dict(row) for row in rows]
157
158
1593. Update `docs/database.md` with:
160 - Table schemas with column descriptions
161 - NetworkX graph model specification (see spec.md Section 5.1)
162 - Persistence strategy (gpickle vs GraphML tradeoffs)
163 - Provenance tracking approach
164
165#### 1.2 FastAPI Application Setup
166
1671. Create `backend/app/config.py`:
168
169
170from pydantic_settings import BaseSettings
171from pathlib import Path
172
173class Settings(BaseSettings):
174 # LLM Configuration
175 llm_endpoint: str = "http://localhost:11434/api/generate" # Default Ollama endpoint
176 llm_model: str = "llama3"
177 embedding_endpoint: str = "http://localhost:11434/api/embeddings"
178 embedding_model: str = "all-minilm"
179
180 # Database Paths
181 db_path: Path = Path(__file__).parent.parent.parent / "data" / "mindmap.db"
182 graph_path: Path = Path(__file__).parent.parent.parent / "data" / "graph.gpickle"
183 vector_db_path: Path = Path(__file__).parent.parent.parent / "data" / "vectors"
184
185 # API Configuration
186 api_host: str = "0.0.0.0"
187 api_port: int = 8000
188 cors_origins: list = ["http://localhost:3000"]
189
190 # Processing Configuration
191 max_batch_size: int = 10
192 extraction_timeout: int = 300 # seconds
193
194 class Config:
195 env_file = ".env"
196
197settings = Settings()
198
199
2002. Create `backend/app/main.py`:
201
202
203from fastapi import FastAPI
204from fastapi.middleware.cors import CORSMiddleware
205from .config import settings
206from .db.db import init_database
207from .api import ingest, graph, search
208
209app = FastAPI(
210 title="Mind Map AI",
211 description="Local LLM-powered personal knowledge graph",
212 version="0.1.0"
213)
214
215# CORS middleware
216app.add_middleware(
217 CORSMiddleware,
218 allow_origins=settings.cors_origins,
219 allow_credentials=True,
220 allow_methods=["*"],
221 allow_headers=["*"],
222)
223
224# Initialize database on startup
225@app.on_event("startup")
226async def startup_event():
227 init_database()
228 # Initialize graph store (will be implemented in Phase 2)
229 # from .services.graph_store import init_graph
230 # init_graph()
231
232# Include routers
233app.include_router(ingest.router, prefix="/api/ingest", tags=["ingestion"])
234app.include_router(graph.router, prefix="/api/graph", tags=["graph"])
235app.include_router(search.router, prefix="/api/search", tags=["search"])
236
237@app.get("/")
238async def root():
239 return {"message": "Mind Map AI API", "version": "0.1.0"}
240
241@app.get("/health")
242async def health_check():
243 return {"status": "healthy"}
244
245
2463. Create empty router files (to be implemented in later phases):
247 - `backend/app/api/__init__.py`
248 - `backend/app/api/ingest.py`
249 - `backend/app/api/graph.py`
250 - `backend/app/api/search.py`
251
2524. Create `backend/requirements.txt`:
253
254
255fastapi==0.109.0
256uvicorn[standard]==0.27.0
257pydantic-settings==2.1.0
258networkx==3.2.1
259requests==2.31.0
260sentence-transformers==2.3.1
261chromadb==0.4.22
262numpy==1.26.3
263python-multipart==0.0.6
264
265
266#### 1.3 Testing & Documentation
267
2681. Create `tests/backend/test_db.py`:
269
270
271import pytest
272from pathlib import Path
273import tempfile
274import shutil
275from backend.app.db import db
276
277@pytest.fixture
278def temp_db():
279 """Create temporary database for testing."""
280 temp_dir = tempfile.mkdtemp()
281 original_db_path = db.DB_PATH
282 db.DB_PATH = Path(temp_dir) / "test.db"
283 db.init_database()
284
285 yield db.DB_PATH
286
287 # Cleanup
288 shutil.rmtree(temp_dir)
289 db.DB_PATH = original_db_path
290
291def test_insert_note(temp_db):
292 """Test note insertion."""
293 note_id = db.insert_note("test.md", "Test content", "/path/to/test.md")
294 assert note_id > 0
295
296 note = db.get_note(note_id)
297 assert note['filename'] == "test.md"
298 assert note['content'] == "Test content"
299 assert note['processed'] == 0
300
301def test_duplicate_note_hash(temp_db):
302 """Test that duplicate content returns existing note_id."""
303 note_id_1 = db.insert_note("test1.md", "Same content")
304 note_id_2 = db.insert_note("test2.md", "Same content")
305
306 assert note_id_1 == note_id_2
307
308def test_insert_extract(temp_db):
309 """Test extract insertion."""
310 note_id = db.insert_note("test.md", "Test content")
311 extract_json = {"nodes": [], "edges": []}
312 extract_id = db.insert_extract(note_id, "llama3", extract_json, 0.95)
313
314 assert extract_id > 0
315
316 extracts = db.get_extracts_for_note(note_id)
317 assert len(extracts) == 1
318 assert extracts[0]['extractor_model'] == "llama3"
319
320def test_mark_note_processed(temp_db):
321 """Test marking note as processed."""
322 note_id = db.insert_note("test.md", "Test content")
323 db.mark_note_processed(note_id)
324
325 note = db.get_note(note_id)
326 assert note['processed'] == 1
327
328
3292. Update `docs/architecture.md` with:
330 - Technology stack rationale
331 - Backend architecture diagram (ASCII art or description)
332 - Data flow from ingestion to graph
333 - Module dependencies
334
3353. Update `docs/cicd_devops.md` with:
336 - Python environment setup (`venv`, dependencies)
337 - Running the backend: `uvicorn app.main:app --reload`
338 - Database initialization steps
339
340**Deliverables**:
341- `backend/app/db/schema.sql` with complete schema
342- `backend/app/db/db.py` with all CRUD functions
343- `backend/app/config.py` with settings management
344- `backend/app/main.py` with FastAPI app initialization
345- `backend/requirements.txt` with all dependencies
346- `tests/backend/test_db.py` with passing unit tests
347- Updated documentation in `docs/`
348
349**Completion Threshold**:
350- [ ] SQLite database can be created and queried
351- [ ] FastAPI server runs locally without errors: `uvicorn app.main:app --reload`
352- [ ] All database unit tests pass: `pytest tests/backend/test_db.py`
353- [ ] `/health` endpoint returns 200 OK
354- [ ] Update `checklist.md` with Phase 1 completion
355- [ ] Log backend setup in `decisions.md`
356
357---


image


image


image


Ok, so we have our first lines of actual code. It is really time to read. Getting this intital set up right is what will save you a lot of heartache later.

Well first thing I did was replace the LLM model name with one that I actually have installed. Second is that I noticed that the files in api folder are all placeholder. That is good to keep in mind as you go as you want to ensure that all placeholder logic is completed or removed as you go.

Now I will simply repeat with the remaining prompts and test as I go. Hopefully this all works. If not I will blame it on using a free model and not state of the art Anthropic Sonnet 4.5 or whatever is now the best. I hope this helps give you a foundation of the entire process.


image


image


Results After All Prompts Run


image


So a basic frontend loads but does this even function? Let's see. I will try to run it and see.

So initial run of docker compose ends with this error:

text
10.412 > next build --turbopack
20.412
30.881 ▲ Next.js 15.5.6 (Turbopack)
40.881
50.901 Creating an optimized production build ...
626.00 ✓ Finished writing to disk in 34ms
726.01 ✓ Compiled successfully in 24.9s
826.02 Linting and checking validity of types ...
927.26
1027.26 Failed to compile.
1127.26
1227.26 ./components/GraphCanvas.tsx
1327.26 9:1 Warning: Unused eslint-disable directive (no problems were reported from '@typescript-eslint/no-explicit-any').
1427.26 60:21 Error: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any
1527.26 64:35 Error: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any
1627.26 77:24 Error: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any
1727.26 81:25 Error: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any
1827.26 103:24 Error: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any
1927.26
2027.26 ./lib/api.ts
2127.26 42:28 Error: Unexpected any. Specify a different type. @typescript-eslint/no-explicit-any
2227.26 88:84 Warning: Unused eslint-disable directive (no problems were reported from '@typescript-eslint/no-explicit-any').
2327.26
2427.26 info - Need to disable some ESLint rules? Learn more here: https://nextjs.org/docs/app/api-reference/config/eslint#disabling-rules
2527.27 npm notice
2627.27 npm notice New major version of npm available! 10.8.2 -> 11.6.2
2727.27 npm notice Changelog: https://github.com/npm/cli/releases/tag/v11.6.2
28[+] Running 1/2e To update run: npm install -g npm@11.6.2
29 ✔ Service backend Built 1.1s
30 ⠏ Service frontend Building 27.9s
31failed to solve: process "/bin/sh -c npm run build" did not complete successfully: exit code: 1

What does it mean?

It means it is the next prompt I enter and pray it fixes it with no thinking on my part. Not really. Do not do that. It will end in a broken heart and you will end up wearing all black for a year or two.


Image


It basically runs. You can even load a document. It does not function like it supposed to. I imagine this is filled with pseudo code. This is why you do not do this do.

So why do you vibe code?

Now I have something to work with. Now is when the real works begins. I do this as part of my learning process. After I have generated something like this I then go through all the functions and such and analyze it to try to get it to function like I initally thought.

Hey I got further than last time at least.

Image


Sovereign AI book cover

Sovereign AI: Building Local-First Intelligent Systems

by Daniel Kliewer · Paperback · 72 pages

The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.