Methodology
How the AI Safety Compendium is actually built — what’s automated, what’s human-reviewed, and what discipline keeps it honest.
Proof-of-concept stage. The repository is currently private; the workflow described below runs in single-maintainer mode. Once coverage and the citation rule have been validated end-to-end, the source repo, lint output, and operation log will open up. Until then, any artifact referenced on this page can be shared on request.
Source code, in one paragraph
The Compendium is an Obsidian vault under wiki/ compiled by Quartz 4 into a static site, deployed on Cloudflare Pages. A Python 3.12 ingestion pipeline fetches arXiv, RSS, and LessWrong feeds weekly, dedupes them, and stages candidates into a weekly inbox. A weekly cron opens a review issue with a checkbox manifest. Approved candidates are compiled into wiki pages by an LLM-assisted maintainer skill (atlas-maintainer).
Page types
Four canonical page types. The schema is defined in the project’s CLAUDE.md (kept inside the private repo at this stage; available on request):
- Concepts (
/concepts) — single ideas (deceptive alignment, RLHF, scalable oversight). Sections: Definition / Why it matters / Key results / Open questions / Related agendas / Related concepts / Related Pages. - Agendas (
/agendas) — research programs. Sections: What the agenda is / Lead orgs & people / Current state / Recent papers / Historical foundations / Open problems / Related Pages. - Summaries (
/summaries) — one per ingested source. Sections: Source metadata / TL;DR / Key claims / Methods / Limitations / How this updates our concepts/agendas / Related Pages. - Entities (
/entities) — thin routing pages for orgs and researchers, so cross-references resolve.
Schema rules are enforced by scripts/lint.py. The headline lint rule: every factual claim in a concept or agenda page must cite a primary URL.
Weekly cadence
Total time per week: ~4–5 hours.
| Day | Step | Time |
|---|---|---|
| Mon | Auto: weekly-sweep.yml cron runs at 06:00 UTC; opens GitHub issue with checkbox manifest | — |
| Tue | Maintainer reviews manifest, checks/unchecks candidates, closes issue | 30 min |
| Tue | apply-approved workflow runs; locally invoke atlas-maintainer skill to ingest approved URLs | 1.5–2h |
| Thu | Write or expand one concept/agenda page from a recent summary | 2h |
| Fri | Run lint, fix any ERRORs, push, deploy | 15 min |
The cron is non-negotiable. If a week slips, the next week doubles up.
LLM assistance — explicit
The Compendium uses LLMs (currently Claude) substantially in compilation. The split:
LLM-driven (with maintainer review):
- Reading staged candidates and writing summary pages from them.
- Drafting concept and agenda updates when a new summary affects them.
- Suggesting wikilinks to maintain the cross-reference graph.
- Lint diagnostics and proposed fixes.
Maintainer-driven:
- Approving/rejecting weekly manifest candidates (whether to ingest at all).
- Deciding what counts as in-scope (scope is defined in the suggest page and the about page).
- Final review of every change before commit. The maintainer reads each edit, not just the diff stats.
- Curating the seed list of high-priority concepts and agendas.
Always human-only:
- Editorial-note synthesis (anything not directly cited).
- Contradiction-handling decisions (which positions to present, how).
- Strategic decisions about scope, pillar tags, content priorities.
- Corrections triage and acknowledgement.
LLMs are tools, not editors. Every page is committed by Kevin Huysegoms.
Citation discipline
The lint enforces:
- Broken wikilinks are ERRORs that block deployment.
- Per-claim primary URL in concept and agenda pages — the rule that defines the Compendium’s editorial character.
- Source-URL in summary frontmatter — every summary identifies its primary source explicitly.
- Bidirectional links — every double-bracket wikilink in a body must be reflected in the target page’s Related Pages section.
Lint warnings (orphan concepts, drifted summaries) are logged but don’t block deployment. They feed the quarterly hygiene sweep.
Auditability
Three artifacts make the methodology checkable. While the project is in proof-of-concept stage the repository itself is private, so the audit surface available externally is the rendered site plus on-request access to internal artifacts:
- Rendered site (public) — the citation rule is verifiable directly from any concept or agenda page: every factual claim links to a primary source. The sitemap,
llms.txt, andatlas-index.jsonare emitted on every build. log.md(private during PoC, available on request) — chronological, append-only operation log. Records every weekly sweep, ingestion run, and lint sweep.- Source repo (private during PoC) — Python pipeline, schema, lint, and weekly-sweep workflows. Will be made public once the schema and weekly cadence have stabilised. Until then, the maintainer shares specific files on request when a reader needs the audit trail behind a particular page.
If the methodology drifts from what’s described here, the drift is visible in the on-request commit history and weekly logs. Once the repo opens up, that visibility becomes one click from every page.
What this is not
- Not finished. The Compendium is a proof of concept. Coverage is partial, the schema is still being validated against real ingests, and breaking changes to URLs or page structure are possible.
- Not yet open to outside contributors. The repository is private during the PoC stage; pull requests and GitHub issues are not accepted. Feedback flows by email to kevin@itforhumanity.be.
- Not peer-reviewed. The substitute for peer review is the citation discipline + on-request audit trail + welcome of corrections by email.
- Not exhaustive. The weekly sweep covers the major arXiv categories and tracked feeds, but it misses material — particularly behind paywalls, in non-English venues, or in rapidly-rotating informal channels.
- Not neutral. Page-creation choices reflect the maintainer’s judgment about what matters. The editorial policy (editorial-policy) tries to make this disciplined; it does not eliminate the bias.