AI Safety Atlas Ch.3 — Appendix: Long-term Questions

Source: Appendix: Long-term Questions

Beyond the immediate strategies of Ch.3, the appendix surfaces philosophical questions that ASI development cannot avoid: what to align to, whom to align to, whether to prioritize survival or flourishing.

Prioritize Flourishing or Survival?

The field has historically emphasized survival during the transition to superintelligence. MacAskill argues for complementary focus on flourishing — ensuring the future’s quality, not merely existence.

The flourishing concern: catastrophe-free futures might still fall drastically short of their potential. Without deliberate guidance, societies risk settling into mediocrity or embedding subtle moral errors.

Viatopia — rather than prescribing utopian visions, this concept advocates a state where societies possess wisdom, coordination, and stability to guide themselves toward optimal futures. Conditions:

  • Minimal existential risk
  • Diverse moral viewpoints
  • Reflective collective decision-making capacity

The strategic choice: should AI systems function solely as catastrophe-prevention tools, or should development prioritize systems enhancing human reasoning, coordination, and value deliberation?

Alignment to What?

Three competing frameworks for value-loading ASI — see coherent-extrapolated-volition:

CEV — Coherent Extrapolated Volition

Yudkowsky’s: what humans would want if we were “smarter, more informed, and more morally developed.” Programs AI to determine our idealized future preferences, not current ones (which embed biases and shortsightedness). Implementation remains speculative.

CAV — Coherent Aggregated Volition

Goertzel’s: focus on current human values without extrapolating development. Acknowledges that fundamental value differences may persist. Coherent aggregations balance diverse perspectives without predicting evolution. More implementable than CEV.

CBV — Coherent Blended Volition

Creative human-guided value blending rather than algorithmic averaging. Drawing from conceptual blending theory. Addresses AI paternalism concerns by keeping value determination human-directed. Real-world precedent: vTaiwan in technology policy contexts.

Alignment to Whom?

Four-way structural decomposition — see alignment-to-whom:

  • Single-Single — individual AI to individual human. Current approach focuses on intent alignment (interpreting intended meaning, not literal commands). Foundational, unsolved.
  • Single-Multi — multiple AIs to one human. Even ASI composed of cooperating smaller intelligences requires solving single-single first. “Ideally, no individual or small group should control superintelligence.”
  • Multi-Single — one AI to many humans. Aggregating preferences creates contradictions. Promising approach: align to higher-level institutional principles and values — mirroring democratic institutions.
  • Multi-Multi — multiple AIs interacting with multiple humans. The misalignment/misuse distinction blurs (AI gaining power vs. humans gaining power via AI).

Perfect individual alignment ≠ safe collective behavior. Different principals may have conflicting interests; systems may fail to coordinate despite aligned goals. “Perfect driver-law alignment doesn’t prevent traffic jams.” Multi-agent failures are distinct failure modes.

Long-Term Philosophical Questions

The appendix surfaces questions that ASI strategy cannot avoid:

  • What values should ASI pursue? Given human value diversity, is agreement achievable? Can humans agree on deliberative processes even without agreed substantive values?
  • Human perpetuity vs. worthy successor? Should alignment prioritize human survival indefinitely, or potentially create superior successors? Dan Faggella’s worthy successor concept — ASI exceeding humanity’s moral and cognitive value. Sutton: succession to AI “mind children” is “inevitable and highly desirable.” Deeply contested.
  • AI consciousness and rights? Could advanced systems become conscious? If so, what moral status? Foundation for s-risks (alternative-risk-categories).
  • Non-human interests? Should alignment include animal welfare, ecosystem preservation, other life forms?

These questions divide AI safety researchers as deeply as technical questions do. “Whether the goal involves indefinite consciousness continuation regardless of physical substrate — as explored in Tegmark’s Life 3.0 — drives vastly different strategic priorities for ASI development.”

Connection to Wiki

This appendix introduces several concepts new to the wiki:

It connects to existing wiki content:

  • longtermism — the flourishing argument is canonical longtermism
  • will-macaskill — viatopia framing is his
  • population-ethics — relevant to non-human interests and worthy successor questions
  • summary-macaskill-effective-altruism — MacAskill’s foundational EA philosophy
  • alternative-risk-categories — s-risks connect to AI consciousness
  • ai-alignment — adds the alignment-to-whom dimension
  • summary-ea-in-age-of-agi — the third-way approach is essentially viatopian