What makes a problem instance unfamiliar to a language model?

This explores what actually makes a specific problem hard for a language model — and the corpus points to a surprising answer: not how complex the problem is, but how closely it resembles things the model has already seen.

This explores what actually makes a specific problem hard for a language model. The most direct answer in the collection is also the most counterintuitive: reasoning models don't break down at some complexity threshold — they break at the boundary of *novelty*. A model can carry a long chain of reasoning flawlessly if it has seen similar instances, and stumble on a logically trivial one it hasn't. The work on instance-level unfamiliarity Do language models fail at reasoning due to complexity or novelty? frames this sharply: models fit patterns tied to specific instances rather than learning a general algorithm, so 'familiarity' — not difficulty — is the real axis along which they succeed or fail.

Why would that be? Another line of work reframes the model as an autoregressive probability machine and predicts failure from the statistics of the target answer Can we predict where language models will fail?. Tasks whose correct response is *low-probability* under training — counting letters, reciting the alphabet backwards — are systematically hard even though a child could do them. So 'unfamiliar' can mean two overlapping things: an instance unlike the training examples, and an answer that's improbable given everything the model absorbed. Both are about the shape of the training distribution, not the logical hardness of the task.

The interesting twist is that models seem to *register* this unfamiliarity internally before they act on it. Under out-of-distribution shift, hidden states sparsify in a localized, systematic way that tracks task novelty — a kind of adaptive filtering rather than a breakdown Do language models sparsify their activations under difficult tasks?. Difficulty itself turns out to be linearly decodable from internal representations before reasoning even begins Can models recognize question difficulty before they reason?. The model 'knows' it's in unfamiliar territory; it just doesn't always change its behavior accordingly. That gap — perception without commitment — is its own failure mode.

Unfamiliarity also isn't only about raw novelty. An instance can become unfamiliar when its correct handling depends on something *unstated* — a background precondition the model never brings forward. The frame-problem work shows accuracy jumping from 30% to 85% simply by forcing the model to enumerate the implicit constraints it would otherwise skip Do language models fail at identifying unstated preconditions?. And an instance can be made unfamiliar by *conflict*: when the prompt supplies information that contradicts strong training-time associations, the parametric prior wins and the model effectively treats the in-context fact as noise Why do language models ignore information in their context?. Familiarity, in other words, can override the evidence right in front of it.

The thread connecting all of this is that 'unfamiliar' is a statement about the relationship between an instance and the training distribution, not a property of the problem in isolation. That's also why a model can explain a concept perfectly and then fail to apply it to a novel case — the explanation pathway is familiar territory, the execution pathway isn't Can language models understand without actually executing correctly? Can LLMs understand concepts they cannot apply?. If you want the map of how these distribution-edge failures cluster together, the survey of epistemic failure modes is the doorway How do LLMs fail to know what they seem to understand?. The takeaway you might not have gone looking for: making a problem 'easier' for a model often means making it *more familiar*, not more simple — and those are not the same lever.

Sources 9 notes

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Can we predict where language models will fail?

By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Can models recognize question difficulty before they reason?

Linear probes successfully decode difficulty from LRM representations before reasoning begins, yet models still overthink simple questions. This reveals an action-commitment failure rather than a perception failure.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an AI researcher re-evaluating what makes problem instances unfamiliar to language models. The question remains open: is unfamiliarity primarily about *training distribution mismatch*, *low-probability answers*, *unstated preconditions*, or *parametric priors overriding context*—or has the regime shifted?

What a curated library found — and when (dated claims, not current truth): Findings span 2023–2026 and include:
• Reasoning failures cluster at instance-level novelty, not task complexity; models fit instance patterns rather than learn general algorithms (2025–2026).
• Correct answers with low probability under training (e.g., letter counting, backward alphabet) are systematically hard regardless of logical simplicity (~2025).
• Hidden states sparsify predictably under out-of-distribution shift, signaling task novelty before the model acts on it; problem difficulty is linearly decodable from representations pre-reasoning (2026).
• Accuracy jumps 30%→85% when unstated preconditions are enumerated; models skip implicit constraints (2025).
• Parametric priors override in-context facts when training associations are strong; context integration fails under conflict (2025).

Anchor papers (verify; mind their dates):
• arXiv:2507.10624 (2025-07): Comprehension Without Competence — explanation pathways vs. execution pathways.
• arXiv:2603.03415 (2026-03): Farther the Shift, Sparser the Representation — OOD mechanisms.
• arXiv:2602.06176 (2026-02): Large Language Model Reasoning Failures — synthesis survey.
• arXiv:2501.11721 (2025-01): Explain-Query-Test — self-evaluation via comprehension discrepancy.

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding, determine whether post-2026 scaling (model size, instruction tuning, synthetic data), retrieval augmentation, chain-of-thought variants, or multi-turn scaffolding have **narrowed or closed** the novelty gap. Separately flag: Does unfamiliarity still predict failure better than task complexity? Do low-probability answers still fail predictably? Are hidden-state sparsity markers still reliable across new model families?
(2) Surface the strongest work from the last 6 months that **reconciles or contradicts** the claim that distribution-match >> logical difficulty. Look for papers showing: general algorithmic learning despite distribution shift; or conversely, deeper evidence that in-context learning remains fundamentally instance-bound.
(3) Propose 2 research questions that **assume the regime may have moved**: (a) If fine-tuning or retrieval can make unfamiliar instances familiar, what is the sample complexity and is there a theoretical floor? (b) Can a model dynamically **detect** unfamiliarity and **invoke** a different reasoning strategy (e.g., search, verification) to overcome it?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

What makes a problem instance unfamiliar to a language model?

Sources 9 notes

Next inquiring lines