EA and AI Safety Books Reference

This summary covers a reference list of five key books at the intersection of effective-altruism, existential-risk, and ai-safety. Together they represent the core reading list for anyone seeking to understand why AI alignment is treated as a priority cause area within EA.

The Five Books

The Precipice: Existential Risk and the Future of Humanity (2020) — Toby Ord

toby-ord examines the full landscape of existential-risk — from nuclear war and climate change to engineered pandemics and unaligned AI. He argues we live at a uniquely pivotal time (the “precipice”) and have a moral duty to safeguard humanity’s future. Key contributions include a quantitative risk table estimating probabilities for different existential catastrophes, the concept of the Long Reflection (a future period of careful deliberation about humanity’s values), and a framework for thinking about risk that has become standard in EA.

Key concepts: Existential risk, longtermism, the Long Reflection, risk landscape

A companion website (theprecipice.com) provides additional data. See also precipice-revisited for Ord’s later updates.

What We Owe the Future (2022) — William MacAskill

will-macaskill’s philosophical case for longtermism — the idea that positively influencing the long-term future is a key moral priority of our time. The book covers population-ethics, value lock-in (the risk that a narrow set of values becomes permanently dominant), and how to think about the moral significance of future generations. MacAskill argues that because the future could contain vastly more people than the present, and because our actions now could have lasting effects on their welfare, the long-term future deserves significant moral weight.

Key concepts: Longtermism, value lock-in, moral weight of future people, trajectory changes

Superintelligence: Paths, Dangers, Strategies (2014) — Nick Bostrom

nick-bostrom’s examination of what happens when machines surpass human intelligence. Covers different paths to superintelligence (whole brain emulation, biological enhancement, AI), the control problem (how to maintain meaningful human oversight of a superintelligent system), and strategic considerations. Introduced or popularized several concepts now central to AI safety discourse:

  • Intelligence explosion: a superintelligent AI could rapidly improve itself
  • Orthogonality thesis: intelligence and goals are independent (a superintelligent AI need not share human values)
  • Instrumental convergence: most goals lead to similar sub-goals (self-preservation, resource acquisition)
  • Treacherous turn: an AI might behave cooperatively until it is powerful enough not to

Key concepts: Intelligence explosion, orthogonality thesis, instrumental convergence, treacherous turn, control problem

Human Compatible: Artificial Intelligence and the Problem of Control (2019) — Stuart Russell

stuart-russell proposes a new framework for AI development based on uncertainty about human preferences. He argues that the standard model of AI — optimizing a fixed objective — is fundamentally flawed, because we cannot fully specify human values. His alternative: build AI systems that are uncertain about what humans want and actively seek to learn those preferences. This leads to cooperative inverse reinforcement learning and assistance games, where the AI’s goal is to help rather than to optimize.

Key concepts: Value alignment, inverse reinforcement learning, assistance games, benefit of uncertainty

The Alignment Problem: Machine Learning and Human Values (2020) — Brian Christian

A narrative history and exploration of the ai-alignment problem. Christian traces the intellectual history from fairness and bias in ML systems through reinforcement learning to modern interpretability research. The book is more accessible than Superintelligence or Human Compatible and serves as a good entry point for understanding why alignment is hard.

Key concepts: Reward hacking, specification gaming, inverse reward design, interpretability, fairness

How the Five Books Fit Together

BookPrimary FocusScope
The PrecipiceExistential risk landscapeBroad — all x-risks
What We Owe the FutureMoral case for longtermismPhilosophy
SuperintelligenceTechnical AI risk scenariosAI-specific
Human CompatibleHow to build safe AIAI-specific (solutions)
The Alignment ProblemHistory of alignment researchAI-specific (narrative)

Reading order for newcomers: The Alignment Problem (accessible narrative) then The Precipice (broad context) then What We Owe the Future (philosophical case) then Superintelligence (technical depth) then Human Compatible (solutions framework).

Significance for the Library

These five books are the canonical long-form introductions to the ideas that motivate the EA and AI safety communities. They provide the arguments that underlie the career recommendations from 80,000 Hours and the cause prioritization discussions on the EA Forum (see summary-ea-forum-key-posts).