AI Safety Compendium

❯

❯

Alignment Techniques

Alignment Techniques

05 May 20261 min read

Alignment Techniques

Methods for making AI systems pursue goals consistent with human intentions. The pages tagged here cover the active techniques — RLHF, Constitutional AI, IDA, debate, scalable-oversight, W2S — and the operational counterpart, control, which bounds consequences when alignment cannot be guaranteed. The cluster is what most of the safety field works on day-to-day.

The technical question that organises this pillar is how do we keep alignment intact as capability scales beyond human evaluator capacity? — see superalignment for the program-level framing, outer-vs-inner-alignment for the foundational decomposition.

Pages tagged here

14 items with this tag.

05 May 2026
Chain of Thought Monitoring
- evaluations
- alignment-techniques
05 May 2026
Character Training and Persona Steering
- alignment-techniques
05 May 2026
Control
- alignment-techniques
05 May 2026
AI Safety via Debate
- alignment-techniques
05 May 2026
Iterative Alignment at Post-Train-Time
- alignment-techniques
05 May 2026
Weak-to-Strong Generalization
- alignment-techniques
05 May 2026
AI Alignment
- alignment-techniques
05 May 2026
AI Control
- alignment-techniques
05 May 2026
Constitutional AI (RLAIF)
- alignment-techniques
05 May 2026
Iterative Amplification
- alignment-techniques
05 May 2026
Outer vs. Inner Alignment
- risk-models
- alignment-techniques
05 May 2026
Reinforcement Learning from Human Feedback (RLHF)
- alignment-techniques
05 May 2026
Superalignment
- alignment-techniques
27 Apr 2026
Scalable Oversight
- alignment-techniques

Created with Quartz v0.1.0 © 2026

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity