Summary — LLM Wiki by Andrej Karpathy
Source: LLM Wiki — Andrej Karpathy
Overview
This gist by andrej-karpathy proposes a pattern for building personal knowledge bases using LLMs. Rather than treating LLMs as on-demand question-answering tools that re-derive knowledge from raw documents each time (the rag paradigm), Karpathy argues for an approach where the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that compounds knowledge over time. The wiki sits as an intermediate layer between raw source documents and the user, functioning as a pre-compiled knowledge artifact.
The Core Insight
The central thesis is that most current LLM-document workflows (including rag, NotebookLM, and ChatGPT file uploads) suffer from a fundamental limitation: they rediscover knowledge from scratch on every query. There is no accumulation, no compounding. A question that requires synthesizing information across five documents forces the LLM to locate and assemble the relevant fragments every single time.
The llm-knowledge-base pattern inverts this. When a new source is added, the LLM does not merely index it for later retrieval. It reads the source, extracts key information, and integrates it into an existing wiki — updating entity pages, revising topic summaries, flagging contradictions, and strengthening cross-references. The knowledge is compiled once and kept current, rather than re-derived per query. This is the key distinction explored in compiled-wiki-vs-rag.
Three-Layer Architecture
The pattern defines three layers:
-
Raw sources — an immutable, curated collection of source documents (articles, papers, images, data files). The LLM reads from these but never modifies them. This is the ground truth.
-
The wiki — a directory of LLM-generated markdown files including summaries, entity pages, concept pages, comparisons, and an overview. The LLM owns this layer entirely. It creates, updates, and cross-references pages as new sources arrive. The user reads the wiki; the LLM writes it.
-
The schema — a configuration document (e.g., CLAUDE.md or AGENTS.md) that defines the wiki’s structure, conventions, and workflows. This is what transforms the LLM from a generic chatbot into a disciplined wiki-schema maintainer. The schema co-evolves with the wiki as the user and LLM discover what works for their domain.
Operations
Karpathy defines three core wiki-operations:
-
Ingest: Drop a new source into the raw collection, and the LLM processes it — reading the source, creating a summary page, updating the index, creating or updating entity and concept pages, and logging the operation. A single source might touch 10-15 wiki pages. The user can stay involved (reviewing summaries, guiding emphasis) or batch-ingest with less supervision.
-
Query: Ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Crucially, valuable answers can be filed back into the wiki as new pages — comparisons, analyses, discoveries. This means explorations compound in the knowledge base just like ingested sources do.
-
Lint: Periodically health-check the wiki for contradictions, stale claims, orphan pages, missing concept pages, broken cross-references, and data gaps. The LLM can also suggest new questions to investigate and new sources to seek out.
Indexing and Navigation
Two special files support navigation:
-
index.md — a content-oriented catalog of every wiki page, organized by category, with links and one-line summaries. The LLM reads this first when answering queries. At moderate scale (~100 sources, hundreds of pages), this works surprisingly well without embedding-based infrastructure.
-
log.md — a chronological, append-only record of operations (ingests, queries, lint passes). Designed to be parseable with simple Unix tools (grep, tail) when entries follow a consistent prefix format.
Tooling Ecosystem
The gist recommends obsidian as the primary interface for browsing the wiki, using its graph view to visualize connections and the Web Clipper extension to capture sources. Additional tools mentioned include marp for markdown-based slide decks, Dataview for querying page frontmatter, and qmd for hybrid BM25/vector search over wiki pages as the wiki grows beyond what the index file can support.
Why It Works
The fundamental insight is about labor division: humans abandon knowledge bases because the maintenance burden grows faster than the value. Updating cross-references, keeping summaries current, flagging contradictions — this is tedious bookkeeping that compounds over time. LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in a single pass. The maintenance cost drops to near zero, which means the wiki stays healthy.
The human’s role is to curate sources, direct analysis, ask good questions, and do the actual thinking. The LLM handles everything else.
Historical Context
Karpathy draws a connection to vannevar-bush’s Memex concept (1945) — a proposed personal knowledge store with associative trails between documents. Bush’s vision was private, actively curated, with the connections between documents as valuable as the documents themselves. The part Bush couldn’t solve was who does the maintenance. The LLM fills that gap.
Scope and Intent
The document is intentionally abstract and implementation-agnostic. It describes a pattern, not a specific system. Directory structure, schema conventions, page formats, and tooling are all left to the user and their LLM to instantiate based on domain, preferences, and tooling. The gist’s stated goal is to communicate the pattern so that an LLM agent can build out the specifics collaboratively.
Related Pages
- llm-knowledge-base
- rag
- compiled-wiki-vs-rag
- wiki-operations
- wiki-schema
- andrej-karpathy
- obsidian
- vannevar-bush
- qmd
- marp
- memex
- knowledge-compounding
- notebooklm
- notebooklm-vs-wiki