AI Safety Atlas Ch.3 — Definitions

Source: Definitions

“Before we can build solutions, we must agree on the problem.” This subchapter disambiguates four overlapping but distinct concepts: AI safety, AI alignment, AI ethics, AI control. The wiki’s existing pages all benefit from these clean definitions.

AI Safety (Definition 3.1)

“Ensuring that AI systems do not inadvertently or deliberately cause harm or danger to humans or the environment, through research that identifies causes of unintended AI behavior and develops tools for safe and reliable operation.”

The broadest category. Encompasses robustness, monitoring, capability control, and more. While ai-alignment addresses goals/intentions, safety encompasses a wider set of concerns. See ai-safety.

AI Alignment (Definition 3.2)

“The problem of building machines that faithfully try to do what we want them to do (or what we ought to want them to do).” — Christiano, 2024

A subset of safety, focused on objective/value matching. The Atlas highlights an important reframe: systems could theoretically be aligned but unsafe (pursuing wrong goals competently) or safe but unaligned (constrained despite misaligned objectives).

Key open questions:

Broader vs. narrower definitions — does alignment include creating beneficial outcomes, or only intent-matching?
“Aligned to whom?” — operators, designers, specific groups, all humanity, ethical principles, hypothetical informed preferences? See alignment-to-whom.
Applying “trying” / “intent” to AI — non-trivial; optimization objectives don’t directly translate to human-like intentions.

See ai-alignment.

AI Ethics (Definition 3.3)

“The study and application of moral principles to AI development and deployment, addressing questions of fairness, transparency, accountability, privacy, autonomy, and other human values that AI systems should respect or promote.” — Huang et al., 2023

Complements technical safety with normative guidance. While alignment ensures systems pursue intended objectives, ethics determines which objectives are worth pursuing. Includes digital rights, potential AI rights, fairness, bias mitigation.

This page-level definition is new to the wiki — connects to the near-term-harms-vs-x-risk tension where AI ethics and AI safety often debate priorities.

AI Control (Definition 3.4)

“The technical and procedural measures designed to prevent AI systems from causing unacceptable outcomes, even if these systems actively attempt to subvert safety measures. Control focuses on maintaining human oversight regardless of whether the AI’s objectives align with human intentions.” — Greenblatt et al., 2024

A distinct strategy from alignment. Alignment prevents preference divergence by design; control creates security layers that function even when alignment fails. Methods: monitoring actions, restricting capabilities, human auditing, termination mechanisms.

The Atlas’s framing — “complementary approaches” — substantially clarifies the wiki’s existing ai-control page, which centers on buck-shlegeris’s framing.

The Four-Way Decomposition

Aspect	AI Safety	AI Alignment	AI Ethics	AI Control
Scope	Broadest	Goal-matching	Normative	Operational containment
Key question	Does it harm?	Does it pursue our goals?	Should it pursue this?	Can we stop it if it doesn’t?
Failure mode	Unsafe deployment	Misaligned objectives	Wrong objectives	Loss of oversight

Connection to Wiki

This subchapter provides clean definitional anchors for the wiki’s existing pages:

ai-safety — Definition 3.1 is now the canonical definition
ai-alignment — Definition 3.2 (Christiano 2024); the broader/narrower distinction
ai-control — Definition 3.4 (Greenblatt et al. 2024) — adds rigor to the existing page
New AI Ethics treatment fills a gap — referenced loosely in near-term-harms-vs-x-risk but not previously a defined concept

AI Safety Compendium

Explorer

AI Safety Atlas Ch.3 — Definitions

AI Safety Atlas Ch.3 — Definitions

AI Safety (Definition 3.1)

AI Alignment (Definition 3.2)

AI Ethics (Definition 3.3)

AI Control (Definition 3.4)

The Four-Way Decomposition

Connection to Wiki

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

AI Safety Atlas Ch.3 — Definitions

AI Safety Atlas Ch.3 — Definitions

AI Safety (Definition 3.1)

AI Alignment (Definition 3.2)

AI Ethics (Definition 3.3)

AI Control (Definition 3.4)

The Four-Way Decomposition

Connection to Wiki

Related Pages

Graph View

Graph view

Table of Contents

Backlinks