Coherent Extrapolated Volition (and CAV / CBV)

A family of three competing frameworks for what values to align ASI to — addressing the deep philosophical question of which humans, which values, and whose extrapolation. The AI Safety Atlas (Ch.3 appendix on long-term questions) treats these as foundational alternatives.

CEV — Coherent Extrapolated Volition

Originally Yudkowsky’s framing.

Align ASI to what humans would want if we were “smarter, more informed, and more morally developed.”

Rather than encoding current preferences (which embed biases, shortsightedness, and unexamined moral errors), CEV proposes programming AI to determine our idealized future preferences. The intuition: a “wise version of humanity” knows things we don’t and would rationally endorse outcomes our current selves cannot articulate.

Implementation difficulties (substantial):

  • Requires sophisticated modeling of human psychology, ethics, social dynamics
  • Beyond current capabilities
  • Infinite-regress concerns: how does the ASI know what “smarter, more informed, more morally developed” means without a meta-CEV?
  • Verification problem: how do we check that the ASI’s extrapolation matches what we’d actually endorse?

CEV remains aspirational/theoretical. It captures intuitive appeal but defers operational difficulty.

CAV — Coherent Aggregated Volition

Proposed by Ben Goertzel as a more pragmatic alternative.

Focus on current human values without extrapolating their development.

CAV acknowledges that fundamental value differences may persist despite “enlightenment” — there is no guarantee humans would converge given more knowledge. Instead, CAV creates coherent aggregations balancing diverse perspectives without predicting evolution.

Trade-offs:

  • More implementable than CEV (no extrapolation step)
  • Risks locking in current values, including their flaws (per value-lock-in)
  • Aggregation rules become contested (whose preferences count, how weighted)

CAV is the realistic-but-conservative variant: it doesn’t promise to elevate human values, just to coordinate their current expression.

CBV — Coherent Blended Volition

Emphasizes creative human-guided value blending rather than algorithmic averaging. Drawing from conceptual blending theory, CBV produces harmonious systems participants recognize as adequately representing their contributions.

Distinguishing properties:

  • Addresses AI paternalism concerns by keeping value determination human-directed
  • Accepts that some “blending” produces emergent values neither party originally held
  • Aimed at producing systems people endorse rather than ones that maximize aggregated preference satisfaction

Real-world precedent: vTaiwan (digital democracy platform) demonstrates CBV-like processes in technology policy contexts — bottom-up value synthesis through structured deliberation.

Comparing the Three

FrameworkSource of ValuesStrengthWeakness
CEVIdealized future humansCaptures moral progressImplementation impossible without meta-CEV
CAVCurrent humansPragmatically tractableLocks in current values
CBVHuman-guided blendingPreserves human agencyOutcome unpredictable; relies on process design

Connection to “Alignment to Whom?”

These three frameworks address what values to align to. They’re complementary to the whom question covered in alignment-to-whom (single-single, single-multi, multi-single, multi-multi). Both questions matter and don’t reduce to each other.

Why This Matters

The Atlas’s treatment is honest: most current technical alignment work focuses on intent alignment (the AI faithfully tries to do what its operator wants) — but whose intent, extrapolated how, is unsolved at the philosophical level.

If ASI alignment succeeds technically but the meta-question is unanswered, the result depends on whichever values happen to be loaded — a major instance of value-lock-in risk. This is why the “alignment to what” question is part of the strategy chapter, not just a philosophical aside: it shapes what successful alignment actually looks like.

Connection to Wiki

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.