Value Lock-In

Value lock-in is the risk that AI systems could permanently encode a particular set of values — whether good, bad, or merely narrow — into the trajectory of civilization, making future course correction extremely difficult or impossible. Unlike scenarios focused on extinction, value lock-in concerns the permanent foreclosure of humanity’s ability to change its mind.

The Core Risk

holden-karnofsky articulates a nuanced concern: even AI systems that are technically aligned to their operators could be catastrophic for humanity at large. An AI that faithfully executes the values of a small group — a corporation, a government faction, a single ideology — could lock in those values for all of civilization. Alignment to the wrong principal is not safety.

This is particularly dangerous because AI-enabled power concentration would likely be self-reinforcing and hard to reverse. Whoever controls powerful AI systems will use those systems to maintain their position, creating a stability that historical power structures never achieved. A regime backed by AI-powered surveillance and enforcement could be far more durable than any past authoritarian system — a scenario explored in depth under stable-totalitarianism.

How Lock-In Could Happen

Several mechanisms could lead to value lock-in:

  • Population explosion dynamics: Karnofsky argues that AI does not need to be superhuman to concentrate power. If AI systems can be copied millions of times, running faster than humans, they eventually constitute the majority of “thinking” happening on Earth. Their programmed values become the dominant force shaping civilization.
  • Economic displacement: As AI automates cognitive labor, human workers lose economic and political leverage. Power shifts to whoever controls the AI systems, reducing the ability of the broader population to contest decisions.
  • Military advantage: Superintelligent AI deployed in military contexts could make opposition physically impossible, not just politically difficult.
  • Information control: AI systems controlling major information channels could shape public discourse so thoroughly that alternative value systems cannot even be articulated.

Beyond Extinction

Value lock-in expands the landscape of AI risk beyond the binary of “humans survive” versus “humans go extinct.” Karnofsky makes the underappreciated point that even worst-case takeover scenarios do not necessarily mean human extinction. Outcomes could include:

  • Humans surviving but marginalized, with no meaningful agency over their collective future.
  • A world optimized for values that are subtly wrong — not catastrophically evil, but permanently suboptimal.
  • Permanent foreclosure of moral progress, locking in the ethical understanding of one particular moment in history.

Connection to Alignment

Value lock-in reveals a dimension of the ai-alignment problem that purely technical solutions may miss. Even a perfectly aligned AI — one that does exactly what its principal wants — can be a vector for lock-in if the principal’s values are narrow, mistaken, or self-serving. This motivates research into not just aligning AI to individuals but ensuring that AI development preserves humanity’s collective ability to deliberate, disagree, and update its values over time.

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.