EA and AI Safety Books Reference
This summary covers a reference list of five key books at the intersection of effective-altruism, existential-risk, and ai-safety. Together they represent the core reading list for anyone seeking to understand why AI alignment is treated as a priority cause area within EA.
The Five Books
The Precipice: Existential Risk and the Future of Humanity (2020) — Toby Ord
toby-ord examines the full landscape of existential-risk — from nuclear war and climate change to engineered pandemics and unaligned AI. He argues we live at a uniquely pivotal time (the “precipice”) and have a moral duty to safeguard humanity’s future. Key contributions include a quantitative risk table estimating probabilities for different existential catastrophes, the concept of the Long Reflection (a future period of careful deliberation about humanity’s values), and a framework for thinking about risk that has become standard in EA.
Key concepts: Existential risk, longtermism, the Long Reflection, risk landscape
A companion website (theprecipice.com) provides additional data. See also precipice-revisited for Ord’s later updates.
What We Owe the Future (2022) — William MacAskill
will-macaskill’s philosophical case for longtermism — the idea that positively influencing the long-term future is a key moral priority of our time. The book covers population-ethics, value lock-in (the risk that a narrow set of values becomes permanently dominant), and how to think about the moral significance of future generations. MacAskill argues that because the future could contain vastly more people than the present, and because our actions now could have lasting effects on their welfare, the long-term future deserves significant moral weight.
Key concepts: Longtermism, value lock-in, moral weight of future people, trajectory changes
Superintelligence: Paths, Dangers, Strategies (2014) — Nick Bostrom
nick-bostrom’s examination of what happens when machines surpass human intelligence. Covers different paths to superintelligence (whole brain emulation, biological enhancement, AI), the control problem (how to maintain meaningful human oversight of a superintelligent system), and strategic considerations. Introduced or popularized several concepts now central to AI safety discourse:
- Intelligence explosion: a superintelligent AI could rapidly improve itself
- Orthogonality thesis: intelligence and goals are independent (a superintelligent AI need not share human values)
- Instrumental convergence: most goals lead to similar sub-goals (self-preservation, resource acquisition)
- Treacherous turn: an AI might behave cooperatively until it is powerful enough not to
Key concepts: Intelligence explosion, orthogonality thesis, instrumental convergence, treacherous turn, control problem
Human Compatible: Artificial Intelligence and the Problem of Control (2019) — Stuart Russell
stuart-russell proposes a new framework for AI development based on uncertainty about human preferences. He argues that the standard model of AI — optimizing a fixed objective — is fundamentally flawed, because we cannot fully specify human values. His alternative: build AI systems that are uncertain about what humans want and actively seek to learn those preferences. This leads to cooperative inverse reinforcement learning and assistance games, where the AI’s goal is to help rather than to optimize.
Key concepts: Value alignment, inverse reinforcement learning, assistance games, benefit of uncertainty
The Alignment Problem: Machine Learning and Human Values (2020) — Brian Christian
A narrative history and exploration of the ai-alignment problem. Christian traces the intellectual history from fairness and bias in ML systems through reinforcement learning to modern interpretability research. The book is more accessible than Superintelligence or Human Compatible and serves as a good entry point for understanding why alignment is hard.
Key concepts: Reward hacking, specification gaming, inverse reward design, interpretability, fairness
How the Five Books Fit Together
| Book | Primary Focus | Scope |
|---|---|---|
| The Precipice | Existential risk landscape | Broad — all x-risks |
| What We Owe the Future | Moral case for longtermism | Philosophy |
| Superintelligence | Technical AI risk scenarios | AI-specific |
| Human Compatible | How to build safe AI | AI-specific (solutions) |
| The Alignment Problem | History of alignment research | AI-specific (narrative) |
Reading order for newcomers: The Alignment Problem (accessible narrative) then The Precipice (broad context) then What We Owe the Future (philosophical case) then Superintelligence (technical depth) then Human Compatible (solutions framework).
Significance for the Library
These five books are the canonical long-form introductions to the ideas that motivate the EA and AI safety communities. They provide the arguments that underlie the career recommendations from 80,000 Hours and the cause prioritization discussions on the EA Forum (see summary-ea-forum-key-posts).
Related Pages
- existential-risk
- longtermism
- ai-safety
- ai-alignment
- effective-altruism
- nick-bostrom
- toby-ord
- will-macaskill
- stuart-russell
- summary-peter-singer-books
- academic-papers-index
- precipice-revisited
- interpretability
- population-ethics
- summary-ea-forum-key-posts
- instrumental-convergence
- rationality
- peter-singer
- ai-in-context-videos
- summary-bostrom-ai-policy
- summary-bostrom-existential-risk-priority
- summary-bostrom-existential-risks
- ea-content-library-inventory
- summary-ea-in-age-of-agi
- rationality-ai-to-zombies