INQUIRING LINE

How do knowledge layers differ functionally from reasoning layers in networks?

This explores what the corpus shows about a literal architectural split inside LLMs — knowledge living in the lower network layers, reasoning happening in the higher ones — and why that division has practical consequences.


This reads the question literally: not knowledge vs. reasoning as ideas, but as different jobs done in different physical layers of the network. The clearest finding in the collection is exactly that — a two-phase picture where the lower layers retrieve stored facts and the higher layers do the reasoning adjustment on top of them Why does reasoning training help math but hurt medical tasks?. The payoff of that separation is a concrete, almost surprising prediction: training a model harder to reason improves math but can actually *degrade* knowledge-heavy domains like medicine, because you're tuning the upper machinery in ways that disturb the lower retrieval it depends on. Knowledge and reasoning aren't just different skills — they're different real estate, and you can damage one by over-developing the other.

That division turns out to be fragile in a deeper way. Mechanistic interpretability work warns that what looks like a clean functional layer may not be the thing actually driving outputs: two models can hit identical accuracy while carrying radically different internal representations, so a tidy 'knowledge here, reasoning there' story can be real or can be a comfortable illusion the metrics don't expose What actually happens inside the minds of language models?. So the layer-separation finding is best held as a useful working model, not a settled map of the territory.

The more interesting move the corpus makes is to stop trusting that the reasoning has to live *inside* the network at all. If knowledge and reasoning are entangled and hard to separate cleanly in the weights, you can pull the reasoning *out* — externalize it into an explicit knowledge graph the model reads and writes. Small models become capable of hard tasks when their reasoning is structured as graph triples rather than buried in activations Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?, and symbolic rules drawn from a graph's structure can supply the navigational plan the network's internal reasoning struggles to hold Can symbolic rules from knowledge graphs guide complex reasoning?. Structured knowledge composition can even out-do raw scale: a 32B model trained on reasoning paths walked through a medical knowledge graph beats much larger systems across fifteen domains Can knowledge graphs teach models deep domain expertise?.

There's a sharper reframing waiting underneath all this. If you believe the higher layers do 'reasoning,' it's worth asking what that reasoning even is. Several notes argue chain-of-thought is pattern-guided imitation, not formal logic — format shapes the output far more than logical content, and structurally invalid prompts work about as well as valid ones What makes chain-of-thought reasoning actually work? What makes chain-of-thought reasoning actually work?. From that angle the 'reasoning layers' may be doing something more like fluent retrieval-and-recombination than the deliberate inference the name implies, which blurs the very distinction the question starts from.

So the honest answer is a layered one. Functionally, the corpus's best single finding is real and actionable — lower layers fetch, upper layers reason, and that's why reasoning training has uneven side effects across domains. But the collection immediately complicates it: the separation may be representationally illusory, the 'reasoning' may be imitation rather than inference, and the most promising engineering response is to externalize reasoning into explicit graph structure rather than trying to keep it cleanly partitioned inside the weights.


Sources 7 notes

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

What actually happens inside the minds of language models?

LLMs can achieve identical accuracy while maintaining radically different internal representations, and mechanisms that appear interpretable may not causally drive outputs. This decoupling means performance metrics alone mask crucial differences in how models actually work.

Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?

Knowledge Graph of Thoughts (KGoT) achieves 29% improvement on GAIA Level 3 tasks using GPT-4o mini by externalizing reasoning into iteratively constructed KG triples. The approach improves transparency, reduces bias, and enables quality control over reasoning steps.

Can symbolic rules from knowledge graphs guide complex reasoning?

SymAgent derives symbolic rules from KG structure using LLM reasoning to create navigational plans that align natural language with graph topology. This approach captures structural reasoning patterns explicitly, outperforming retrieval methods that rely on semantic similarity alone.

Can knowledge graphs teach models deep domain expertise?

Fine-tuning a 32B model on 24,000 reasoning tasks derived from medical knowledge graph paths produces state-of-the-art performance across 15 medical domains, demonstrating that structured knowledge composition matters more than scale.

What makes chain-of-thought reasoning actually work?

Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.

What makes chain-of-thought reasoning actually work?

CoT systems reproduce the form of reasoning through pattern matching rather than performing genuine logical inference. This explains why format effects dominate content, why structurally invalid prompts succeed, and why stronger reasoning models become less instruction-compliant.

Next inquiring lines