EA and AI Safety Books Reference

This summary covers a reference list of five key books at the intersection of effective-altruism, existential-risk, and ai-safety. Together they represent the core reading list for anyone seeking to understand why AI alignment is treated as a priority cause area within EA.

The Five Books

The Precipice: Existential Risk and the Future of Humanity (2020) — Toby Ord

toby-ord examines the full landscape of existential-risk — from nuclear war and climate change to engineered pandemics and unaligned AI. He argues we live at a uniquely pivotal time (the “precipice”) and have a moral duty to safeguard humanity’s future. Key contributions include a quantitative risk table estimating probabilities for different existential catastrophes, the concept of the Long Reflection (a future period of careful deliberation about humanity’s values), and a framework for thinking about risk that has become standard in EA.

Key concepts: Existential risk, longtermism, the Long Reflection, risk landscape

A companion website (theprecipice.com) provides additional data. See also precipice-revisited for Ord’s later updates.

What We Owe the Future (2022) — William MacAskill

will-macaskill’s philosophical case for longtermism — the idea that positively influencing the long-term future is a key moral priority of our time. The book covers population-ethics, value lock-in (the risk that a narrow set of values becomes permanently dominant), and how to think about the moral significance of future generations. MacAskill argues that because the future could contain vastly more people than the present, and because our actions now could have lasting effects on their welfare, the long-term future deserves significant moral weight.

Key concepts: Longtermism, value lock-in, moral weight of future people, trajectory changes

Superintelligence: Paths, Dangers, Strategies (2014) — Nick Bostrom

nick-bostrom’s examination of what happens when machines surpass human intelligence. Covers different paths to superintelligence (whole brain emulation, biological enhancement, AI), the control problem (how to maintain meaningful human oversight of a superintelligent system), and strategic considerations. Introduced or popularized several concepts now central to AI safety discourse:

Intelligence explosion: a superintelligent AI could rapidly improve itself
Orthogonality thesis: intelligence and goals are independent (a superintelligent AI need not share human values)
Instrumental convergence: most goals lead to similar sub-goals (self-preservation, resource acquisition)
Treacherous turn: an AI might behave cooperatively until it is powerful enough not to

Key concepts: Intelligence explosion, orthogonality thesis, instrumental convergence, treacherous turn, control problem

Human Compatible: Artificial Intelligence and the Problem of Control (2019) — Stuart Russell

stuart-russell proposes a new framework for AI development based on uncertainty about human preferences. He argues that the standard model of AI — optimizing a fixed objective — is fundamentally flawed, because we cannot fully specify human values. His alternative: build AI systems that are uncertain about what humans want and actively seek to learn those preferences. This leads to cooperative inverse reinforcement learning and assistance games, where the AI’s goal is to help rather than to optimize.

Key concepts: Value alignment, inverse reinforcement learning, assistance games, benefit of uncertainty

The Alignment Problem: Machine Learning and Human Values (2020) — Brian Christian

A narrative history and exploration of the ai-alignment problem. Christian traces the intellectual history from fairness and bias in ML systems through reinforcement learning to modern interpretability research. The book is more accessible than Superintelligence or Human Compatible and serves as a good entry point for understanding why alignment is hard.

Key concepts: Reward hacking, specification gaming, inverse reward design, interpretability, fairness

How the Five Books Fit Together

Book	Primary Focus	Scope
The Precipice	Existential risk landscape	Broad — all x-risks
What We Owe the Future	Moral case for longtermism	Philosophy
Superintelligence	Technical AI risk scenarios	AI-specific
Human Compatible	How to build safe AI	AI-specific (solutions)
The Alignment Problem	History of alignment research	AI-specific (narrative)

Reading order for newcomers: The Alignment Problem (accessible narrative) then The Precipice (broad context) then What We Owe the Future (philosophical case) then Superintelligence (technical depth) then Human Compatible (solutions framework).

Significance for the Library

These five books are the canonical long-form introductions to the ideas that motivate the EA and AI safety communities. They provide the arguments that underlie the career recommendations from 80,000 Hours and the cause prioritization discussions on the EA Forum (see summary-ea-forum-key-posts).

existential-risk
longtermism
ai-safety
ai-alignment
effective-altruism
nick-bostrom
toby-ord
will-macaskill
stuart-russell
summary-peter-singer-books
academic-papers-index
precipice-revisited
interpretability
population-ethics
summary-ea-forum-key-posts
instrumental-convergence
rationality
peter-singer
ai-in-context-videos
summary-bostrom-ai-policy
summary-bostrom-existential-risk-priority
summary-bostrom-existential-risks
ea-content-library-inventory
summary-ea-in-age-of-agi
rationality-ai-to-zombies

AI Safety Compendium

Explorer

EA and AI Safety Books Reference

EA and AI Safety Books Reference

The Five Books

The Precipice: Existential Risk and the Future of Humanity (2020) — Toby Ord

What We Owe the Future (2022) — William MacAskill

Superintelligence: Paths, Dangers, Strategies (2014) — Nick Bostrom

Human Compatible: Artificial Intelligence and the Problem of Control (2019) — Stuart Russell

The Alignment Problem: Machine Learning and Human Values (2020) — Brian Christian

How the Five Books Fit Together

Significance for the Library

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

EA and AI Safety Books Reference

EA and AI Safety Books Reference

The Five Books

The Precipice: Existential Risk and the Future of Humanity (2020) — Toby Ord

What We Owe the Future (2022) — William MacAskill

Superintelligence: Paths, Dangers, Strategies (2014) — Nick Bostrom

Human Compatible: Artificial Intelligence and the Problem of Control (2019) — Stuart Russell

The Alignment Problem: Machine Learning and Human Values (2020) — Brian Christian

How the Five Books Fit Together

Significance for the Library

Related Pages

Graph View

Graph view

Table of Contents