Coherent Extrapolated Volition (and CAV / CBV)

A family of three competing frameworks for what values to align ASI to — addressing the deep philosophical question of which humans, which values, and whose extrapolation. The AI Safety Atlas (Ch.3 appendix on long-term questions) treats these as foundational alternatives.

CEV — Coherent Extrapolated Volition

Originally Yudkowsky’s framing.

Align ASI to what humans would want if we were “smarter, more informed, and more morally developed.”

Rather than encoding current preferences (which embed biases, shortsightedness, and unexamined moral errors), CEV proposes programming AI to determine our idealized future preferences. The intuition: a “wise version of humanity” knows things we don’t and would rationally endorse outcomes our current selves cannot articulate.

Implementation difficulties (substantial):

Requires sophisticated modeling of human psychology, ethics, social dynamics
Beyond current capabilities
Infinite-regress concerns: how does the ASI know what “smarter, more informed, more morally developed” means without a meta-CEV?
Verification problem: how do we check that the ASI’s extrapolation matches what we’d actually endorse?

CEV remains aspirational/theoretical. It captures intuitive appeal but defers operational difficulty.

CAV — Coherent Aggregated Volition

Proposed by Ben Goertzel as a more pragmatic alternative.

Focus on current human values without extrapolating their development.

CAV acknowledges that fundamental value differences may persist despite “enlightenment” — there is no guarantee humans would converge given more knowledge. Instead, CAV creates coherent aggregations balancing diverse perspectives without predicting evolution.

Trade-offs:

More implementable than CEV (no extrapolation step)
Risks locking in current values, including their flaws (per value-lock-in)
Aggregation rules become contested (whose preferences count, how weighted)

CAV is the realistic-but-conservative variant: it doesn’t promise to elevate human values, just to coordinate their current expression.

CBV — Coherent Blended Volition

Emphasizes creative human-guided value blending rather than algorithmic averaging. Drawing from conceptual blending theory, CBV produces harmonious systems participants recognize as adequately representing their contributions.

Distinguishing properties:

Addresses AI paternalism concerns by keeping value determination human-directed
Accepts that some “blending” produces emergent values neither party originally held
Aimed at producing systems people endorse rather than ones that maximize aggregated preference satisfaction

Real-world precedent: vTaiwan (digital democracy platform) demonstrates CBV-like processes in technology policy contexts — bottom-up value synthesis through structured deliberation.

Comparing the Three

Framework	Source of Values	Strength	Weakness
CEV	Idealized future humans	Captures moral progress	Implementation impossible without meta-CEV
CAV	Current humans	Pragmatically tractable	Locks in current values
CBV	Human-guided blending	Preserves human agency	Outcome unpredictable; relies on process design

Connection to “Alignment to Whom?”

These three frameworks address what values to align to. They’re complementary to the whom question covered in alignment-to-whom (single-single, single-multi, multi-single, multi-multi). Both questions matter and don’t reduce to each other.

Why This Matters

The Atlas’s treatment is honest: most current technical alignment work focuses on intent alignment (the AI faithfully tries to do what its operator wants) — but whose intent, extrapolated how, is unsolved at the philosophical level.

If ASI alignment succeeds technically but the meta-question is unanswered, the result depends on whichever values happen to be loaded — a major instance of value-lock-in risk. This is why the “alignment to what” question is part of the strategy chapter, not just a philosophical aside: it shapes what successful alignment actually looks like.

Connection to Wiki

ai-alignment — adds the value-source question to existing intent-alignment treatment
value-lock-in — what’s locked in depends on which framework is used
alignment-to-whom — complementary “whom” question
asi-safety-strategies — ASI strategy must answer this question
longtermism, population-ethics — philosophical contexts
eliezer-yudkowsky — CEV originator
atlas-ch3-strategies-09-appendix-long-term-questions — primary source

ai-alignment
value-lock-in
alignment-to-whom
asi-safety-strategies
longtermism
population-ethics
eliezer-yudkowsky
ai-safety-atlas-textbook
atlas-ch3-strategies-09-appendix-long-term-questions

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

AI Safety Atlas Ch.3 — Appendix: Long-term Questions — referenced as [[atlas-ch3-strategies-09-appendix-long-term-questions]]

AI Safety Compendium

Explorer

Coherent Extrapolated Volition (and CAV / CBV)

Coherent Extrapolated Volition (and CAV / CBV)

CEV — Coherent Extrapolated Volition

CAV — Coherent Aggregated Volition

CBV — Coherent Blended Volition

Comparing the Three

Connection to “Alignment to Whom?”

Why This Matters

Connection to Wiki

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Coherent Extrapolated Volition (and CAV / CBV)

Coherent Extrapolated Volition (and CAV / CBV)

CEV — Coherent Extrapolated Volition

CAV — Coherent Aggregated Volition

CBV — Coherent Blended Volition

Comparing the Three

Connection to “Alignment to Whom?”

Why This Matters

Connection to Wiki

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks