LLM AGI will have memory, and memory changes alignment

Seth Herd — 2025-04-04 — LessWrong

Summary

Argues that LLM-based AGI will likely have memory systems enabling learning during deployment, and that this learning can functionally change alignment through accumulation of new beliefs and goals. Proposes empirical investigation of belief and value evolution in LLMs with memory.

Source

Link: https://lesswrong.com/posts/aKncW36ZdEnzxLo8A/llm-agi-will-have-memory-and-memory-changes-alignment
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- behavior-alignment-theory — Theory / Corrigibility

behavior-alignment-theory

AI Safety Compendium

Explorer

LLM AGI will have memory, and memory changes alignment

LLM AGI will have memory, and memory changes alignment

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

LLM AGI will have memory, and memory changes alignment

LLM AGI will have memory, and memory changes alignment

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents