Can LLMs read long documents like humans do?

How might mimicking human reading strategies—storing gist memories and looking up details on demand—help language models handle documents beyond their effective context window?

Synthesis note · 2026-06-03 · sourced from Memory

LLMs are limited not only by an explicit context window but by degrading performance on long inputs well before that limit. ReadAgent's premise is that humans read differently: exact wording is forgotten quickly while gist — the substance irrespective of exact words — persists, and reading is interactive (we look back when we need a detail). It implements this as a simple prompting system that (1) decides what content to store together as a memory episode, (2) compresses each episode into a short gist memory, and (3) looks up the original passages only when a task requires the details. This extends effective context 3–20× and outperforms retrieval baselines on QuALITY, NarrativeQA, and QMSum.

The keeper is that the LLM can generate broadly useful gist memories before knowing the task — compression need not be query-conditioned to be useful — and can then reason interactively over those gists to decide what to retrieve. Gist-first-then-lookup is a different long-context strategy than either stuffing the window or pure retrieval.

This is the gist-compression member of the vault's long-context/memory cluster. It shares the compress-then-act move with Can agents compress their own memory without losing critical details? and the bounded-state philosophy of Can agents fail from weak memory control rather than missing knowledge? — but applied to reading documents rather than managing agent state, and beating retrieval rather than replacing it.

Inquiring lines that use this note as a source 11

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 85 in 2-hop network ·medium cluster Open in graph ↗

Can LLMs read long documents like humans do? Can agents compress their own memory without losin… Can agents fail from weak memory control rather th… Can recurrent memory scale where attention fails o…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can agents compress their own memory without losing critical details? Explores whether agents can autonomously consolidate interaction history into structured memory schemas that reduce token overhead while preserving information needed for long-horizon reasoning and strategic reflection.
same compress-then-act move, applied to agent state rather than document reading
Can agents fail from weak memory control rather than missing knowledge? As multi-turn agent workflows grow longer, performance degrades—but is this due to insufficient context or poor memory management? This explores whether memory *control* is the real bottleneck.
shared bounded-memory philosophy; ReadAgent gists documents, ACC commits agent state
Can recurrent memory scale where attention fails on ultra-long text? GPT-4 and RAG plateau around 10,000 tokens and rely heavily on the first quarter of input. Can recurrent memory augmentation overcome these limits and enable reasoning across millions of tokens?
alternative long-context route (recurrent state) vs ReadAgent's gist-and-lookup

Can LLMs read long documents like humans do?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 5