Can LLMs read long documents like humans do?
How might mimicking human reading strategies—storing gist memories and looking up details on demand—help language models handle documents beyond their effective context window?
LLMs are limited not only by an explicit context window but by degrading performance on long inputs well before that limit. ReadAgent's premise is that humans read differently: exact wording is forgotten quickly while gist — the substance irrespective of exact words — persists, and reading is interactive (we look back when we need a detail). It implements this as a simple prompting system that (1) decides what content to store together as a memory episode, (2) compresses each episode into a short gist memory, and (3) looks up the original passages only when a task requires the details. This extends effective context 3–20× and outperforms retrieval baselines on QuALITY, NarrativeQA, and QMSum.
The keeper is that the LLM can generate broadly useful gist memories before knowing the task — compression need not be query-conditioned to be useful — and can then reason interactively over those gists to decide what to retrieve. Gist-first-then-lookup is a different long-context strategy than either stuffing the window or pure retrieval.
This is the gist-compression member of the vault's long-context/memory cluster. It shares the compress-then-act move with Can agents compress their own memory without losing critical details? and the bounded-state philosophy of Can agents fail from weak memory control rather than missing knowledge? — but applied to reading documents rather than managing agent state, and beating retrieval rather than replacing it.
Inquiring lines that use this note as a source 11
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does gist-first lookup compare to pure retrieval or context stuffing?
- How should memory systems split between short-term and long-term storage?
- Can task-agnostic compression of documents remain broadly useful for later queries?
- Why do LLMs degrade on long inputs before hitting context limits?
- Does recurrent memory or gist compression work better for ultra-long context?
- Can recurrent state mechanisms process longer sequences than attention-based working memory approaches?
- How do adaptive memory modules compare to feedback-based working memory for long context?
- What document layouts benefit most from bounding box representations?
- What makes procedural knowledge in documents generalize better than facts?
- How do recurrent memory systems handle ultra-long context differently than attention?
- Can fixed-size latent states losslessly store arbitrary input context?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can agents compress their own memory without losing critical details?
Explores whether agents can autonomously consolidate interaction history into structured memory schemas that reduce token overhead while preserving information needed for long-horizon reasoning and strategic reflection.
same compress-then-act move, applied to agent state rather than document reading
-
Can agents fail from weak memory control rather than missing knowledge?
As multi-turn agent workflows grow longer, performance degrades—but is this due to insufficient context or poor memory management? This explores whether memory *control* is the real bottleneck.
shared bounded-memory philosophy; ReadAgent gists documents, ACC commits agent state
-
Can recurrent memory scale where attention fails on ultra-long text?
GPT-4 and RAG plateau around 10,000 tokens and rely heavily on the first quarter of input. Can recurrent memory augmentation overcome these limits and enable reasoning across millions of tokens?
alternative long-context route (recurrent state) vs ReadAgent's gist-and-lookup
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
- Faith and Fate: Limits of Transformers on Compositionality
- Long-context LLMs Struggle with Long In-context Learning
- Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
- The AI Hippocampus: How Far are We From Human Memory?
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
- Context Embeddings for Efficient Answer Generation in RAG
Original note title
a human-inspired reading agent compresses documents into gist memories and looks up details on demand extending effective context twentyfold