Can models store unlimited facts without growing larger?
Does external tool use let language models recall facts without being constrained by parameter count? This matters because it could reshape how we scale knowledge capacity beyond architectural limits.
Tool-augmented models are everywhere, but the theoretical case for why they help has been thin. This paper supplies it for factual recall. The number of facts a model can store purely in its weights is fundamentally bounded by its parameter count — so scaling knowledge capacity by enlarging the model is inherently inefficient. By contrast, a simple, efficient circuit construction proves that tool-use (external retrieval) enables unbounded factual recall without growing parameters.
The empirical half sharpens it into a phase transition: in-weight models need ever-larger architectures to memorize growing datasets, while tool-augmented models rapidly shift to rule-based querying once they observe enough diversity — decoupling memory capacity from model size. And the cost of the wrong choice is concrete: in-weight finetuning for factual recall degrades general capabilities, because limited capacity forces new facts to overwrite prior knowledge. Tool-based externalization preserves core skills, cuts training cost, and introduces minimal behavioral drift.
This gives a formal floor to the vault's harness thesis. Since Where does agent reliability actually come from?, in-tool learning is the provable version: externalize facts to tools rather than burning parameters and overwriting capability. It also predicts the failure mode behind Does repeated sensitive data in fine-tuning cause memorization? — finetuning facts in is exactly what memorizes and overwrites.
Inquiring lines that use this note as a source 11
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does the right structural prior matter more than raw model capacity?
- Why does in-weight memorization fail compared to tool-based fact access?
- Why does attending to own latents work better than bolted-on external memory stores?
- What causes overfitting when forcing new facts into model weights?
- How does in-weight memorization scale with model parameter count?
- What is the theoretical capacity limit before memorization saturates?
- Is forgetting in language models reversible or permanent knowledge loss?
- How do newly learned facts become accessible after gradient updates?
- Does finetuning facts into weights overwrite existing model capabilities?
- What makes factual memorization less efficient than tool-based retrieval?
- Why does tool use decouple factual capacity from model parameter count?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do tools actually expand what language models can reason about?
Explores whether tool access fundamentally breaks through reasoning limits in pure-text models, or merely optimizes existing capabilities. Understanding this distinction clarifies whether tools are luxury features or necessity for genuine capability growth.
companion formal result: that proof is about reasoning support, this one about factual capacity
-
Where does agent reliability actually come from?
Exploring whether LLM agent performance depends on larger models or on thoughtful system design choices like memory, skills, and protocols that shift cognitive work outside the model.
the empirical thesis this proof formalizes for factual recall
-
Can agents fail from weak memory control rather than missing knowledge?
As multi-turn agent workflows grow longer, performance degrades—but is this due to insufficient context or poor memory management? This explores whether memory *control* is the real bottleneck.
both relocate capability from parameters to external, managed state
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Provable Benefits of In-Tool Learning for Large Language Models
- How much do language models memorize?
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
- Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
- Spurious Forgetting in Continual Learning of Language Models
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
- Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
Original note title
tool use provably decouples factual-recall capacity from parameter count — in-weight memorization is bounded by model size while tool-use is unbounded