INQUIRING LINE

How does cognitive fit theory explain why different tasks need different knowledge structures?

This explores cognitive fit theory — the idea that performance improves when the way information is represented matches what the task actually demands — and what the corpus shows about why no single knowledge structure works for everything.


This explores cognitive fit theory: the claim that you reason best when the *shape* of your knowledge matches the *shape* of the task, and that mismatched structures impose a hidden cost. The clearest demonstration in the collection is StructRAG Can routing queries to task-matched structures improve RAG reasoning?, which makes the theory operational. Instead of dumping the same retrieved text chunks at every question, it trains a router to pick a structure — a table, a graph, an algorithm, a catalogue, or plain chunks — based on what the query is asking for. A comparison question wants a table; a multi-hop question wants a graph. Matching the structure to the demand beats uniform retrieval, which is cognitive fit theory's core prediction made concrete.

Why would a mismatch cost anything? Two notes suggest the cost is baked into how the model is built. One finds that knowledge retrieval lives in the lower layers of the network while reasoning adjustment happens in the higher ones Why does reasoning training help math but hurt medical tasks? — and this split explains why training a model harder on reasoning sharpens math but *degrades* knowledge-heavy domains like medicine. The structure that serves one task actively interferes with the other. A related finding shows the two kinds of knowledge are sourced differently in the first place: reasoning draws on broad, transferable *procedural* knowledge, while factual recall depends on narrow, document-specific memorization Does procedural knowledge drive reasoning more than factual retrieval?. A 'how-to' task and a 'what-is' task aren't just different questions — they pull from different machinery.

The corpus also shows the fit principle working at the level of architecture, not just retrieval. Splitting a problem-solver into a separate decomposer and solver outperforms one monolithic model Does separating planning from execution improve reasoning accuracy?, because planning and execution want different representations and interfere when fused — and notably, the decomposition skill transfers across domains while the solving skill doesn't. The knowledge-graph work pushes the same idea further: externalizing reasoning into graph triples lets even small models handle complex tasks Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?, and deriving symbolic rules from a graph's topology gives reasoning a navigational structure that pure semantic-similarity retrieval can't Can symbolic rules from knowledge graphs guide complex reasoning?. When the task is structural, a structured representation pays off.

There's a sharper edge here worth noticing. A cluster of chain-of-thought critiques finds that models often learn the *form* of reasoning rather than genuine inference — logically invalid CoT prompts perform nearly as well as valid ones Does logical validity actually drive chain-of-thought gains?, and CoT degrades predictably once you push it outside its training distribution Does chain-of-thought reasoning actually generalize beyond training data?. Read alongside cognitive fit, this is a warning: if a model is pattern-matching the *shape* of reasoning rather than reasoning, then giving it the right-shaped structure may be doing more of the work than we credit. The structure isn't just a convenience — for these systems it may be load-bearing in a way it isn't for a human expert.

If you want the wider frame, Marr's three levels of analysis Can cognitive science methods unlock how LLMs actually work? is the cognitive-science lineage cognitive fit theory comes from — the habit of asking what computation a task requires before asking how to implement it. That's ultimately what 'different tasks need different knowledge structures' means: the task defines the computation, and the right structure is the one that makes that computation cheap.


Sources 9 notes

Can routing queries to task-matched structures improve RAG reasoning?

StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?

Knowledge Graph of Thoughts (KGoT) achieves 29% improvement on GAIA Level 3 tasks using GPT-4o mini by externalizing reasoning into iteratively constructed KG triples. The approach improves transparency, reduces bias, and enables quality control over reasoning steps.

Can symbolic rules from knowledge graphs guide complex reasoning?

SymAgent derives symbolic rules from KG structure using LLM reasoning to create navigational plans that align natural language with graph topology. This approach captures structural reasoning patterns explicitly, outperforming retrieval methods that rely on semantic similarity alone.

Does logical validity actually drive chain-of-thought gains?

Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Can cognitive science methods unlock how LLMs actually work?

Cognitive science's 70-year toolkit of behavioral probes, causal interventions, and representational analysis transfers directly to LLM interpretation. Marr's computational, algorithmic, and implementation levels reframe the problem structurally and enable layered rather than monolithic explanation.

Next inquiring lines