Why do LLMs strip applicability conditions during memory abstraction?

This explores why, when an LLM compresses a specific experience into a general, reusable memory or rule, it tends to discard the "this only holds when X" qualifiers that made the original instance correct.

This reads the question as being about a specific failure of generalization: an LLM takes a particular success, abstracts it into a rule worth remembering, and in doing so drops the preconditions that made it work in the first place. The corpus doesn't have a note titled "memory abstraction," but it converges on this from several directions, and the most direct fit is the frame problem. Work on unstated preconditions Do language models fail at identifying unstated preconditions? shows the failure isn't missing knowledge — it's that models don't *bring background conditions forward as relevant constraints*. Abstraction is precisely the moment where those conditions are most fragile: the model keeps the part that looks like the lesson and silently sheds the part it never learned to treat as load-bearing. Tellingly, forcing explicit enumeration of preconditions lifts accuracy from 30% to 85% — the conditions were knowable, just not surfaced.

A second angle explains *why* the stripping goes unnoticed: explanation and application appear to run on separate tracks. The potemkin-understanding pattern Can LLMs understand concepts they cannot apply? and the "comprehension without competence" split-brain finding Can language models understand without actually executing correctly? both show models that can state a principle correctly (87% accuracy on explanations) yet fail to apply it (64% in action). An abstracted memory lives on the explanation side — it reads as a clean, confident rule — while the applicability conditions belong to the execution side that the model is worst at preserving. So the abstraction *sounds* well-formed exactly because the conditional scaffolding that would complicate it has been left behind.

There's also a sense in which stripping context is what abstraction *is* — and the corpus shows this cuts both ways. LLM Programs deliberately hide step-irrelevant context to make reasoning tractable Can algorithms control LLM reasoning better than LLMs alone?, and the abstraction-only optimization paradigm Should LLMs handle abstraction only in optimization? argues models are at their best when restricted to translating messy input into clean formal structure. Discarding detail is the feature. The problem is that in memory, the applicability condition is not noise to be hidden — it's the most important payload. The same compression reflex that makes abstraction useful for planning makes it lossy for memory, because the model has no reliable way to tell a removable detail from a governing precondition.

What makes this dangerous rather than merely imperfect is that the loss compounds quietly. Frontier models silently corrupt roughly 25% of document content across long relay workflows, with errors accumulating and never plateauing Do frontier LLMs silently corrupt documents in long workflows?. A memory store that abstracts, re-abstracts, and recalls over many turns is exactly such a relay — each pass an opportunity to shed one more qualifier. And the model can't audit its way out: internal structure work shows that pushing one quality (say, a crisp summary) reliably degrades another like faithfulness What actually happens inside a language model?, and self-improvement is formally bounded by the gap between generating and verifying What stops large language models from improving themselves?. The model that strips a condition is not equipped to notice it stripped one.

The quietly useful takeaway: stripping applicability conditions isn't a quirk of any one memory system, it's the predictable intersection of three things the corpus documents independently — the frame problem (preconditions are never surfaced as constraints), the explanation/execution split (the rule survives, the conditions don't), and the fact that abstraction is defined by discarding context. The implied fix mirrors the frame-problem result: don't trust the model to retain conditions implicitly — make the memory schema *force* the precondition to be written down alongside the rule, the same way explicit enumeration rescued the 30%-to-85% jump.

Sources 8 notes

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Should LLMs handle abstraction only in optimization?

LLMs plateau at constraint satisfaction regardless of scale, but excel at natural-language-to-formal-structure translation. The productive architecture restricts LLMs to reading input and emitting solver code, leaving numeric iteration to deterministic solvers.

Do frontier LLMs silently corrupt documents in long workflows?

Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.

What actually happens inside a language model?

Research shows that LLMs can achieve the same output through different internal mechanisms, and improvements in one dimension like accuracy reliably degrade others like faithfulness and calibration. Internal structure matters even when behavior appears identical.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Why do LLMs strip applicability conditions during memory abstraction?

Sources 8 notes

Next inquiring lines