AI Safety Compendium

AI Safety Compendium https://aiforhumanity.eu Last 10 notes on AI Safety Compendium Quartz -- quartz.jzhao.xyz Connect AI Tools https://aiforhumanity.eu/connect https://aiforhumanity.eu/connect Fri, 08 May 2026 00:00:00 GMT About https://aiforhumanity.eu/about https://aiforhumanity.eu/about Tue, 05 May 2026 00:00:00 GMT Capability Evals https://aiforhumanity.eu/agendas/capability-evals https://aiforhumanity.eu/agendas/capability-evals Tue, 05 May 2026 00:00:00 GMT Chain of Thought Monitoring https://aiforhumanity.eu/agendas/chain-of-thought-monitoring https://aiforhumanity.eu/agendas/chain-of-thought-monitoring Tue, 05 May 2026 00:00:00 GMT Character Training and Persona Steering https://aiforhumanity.eu/agendas/character-training-and-persona-steering https://aiforhumanity.eu/agendas/character-training-and-persona-steering Tue, 05 May 2026 00:00:00 GMT Control https://aiforhumanity.eu/agendas/control https://aiforhumanity.eu/agendas/control Tue, 05 May 2026 00:00:00 GMT AI Safety via Debate https://aiforhumanity.eu/agendas/debate https://aiforhumanity.eu/agendas/debate Tue, 05 May 2026 00:00:00 GMT Iterative Alignment at Post-Train-Time https://aiforhumanity.eu/agendas/iterative-alignment-at-post-train-time https://aiforhumanity.eu/agendas/iterative-alignment-at-post-train-time Tue, 05 May 2026 00:00:00 GMT Reverse Engineering (Mechanistic Interpretability) https://aiforhumanity.eu/agendas/reverse-engineering https://aiforhumanity.eu/agendas/reverse-engineering Tue, 05 May 2026 00:00:00 GMT Weak-to-Strong Generalization https://aiforhumanity.eu/agendas/weak-to-strong-generalization https://aiforhumanity.eu/agendas/weak-to-strong-generalization Tue, 05 May 2026 00:00:00 GMT