Can pruning half of LLM layers affect knowledge retrieval performance?

This explores whether you can cut a large fraction of an LLM's layers without hurting its ability to recall facts — and what that would reveal about how knowledge is stored across the network.

This reads the question as asking whether knowledge retrieval survives aggressive layer pruning, which is really a question about how concentrated or redundant a model's knowledge actually is. Worth saying directly: the corpus has no paper that prunes half the layers and measures retrieval, so I can't hand you that experiment. But several notes circle the underlying issue — how much of the network does real work, and where knowledge actually lives — and together they make pruning's likely effects less mysterious.

The most suggestive piece is the finding that hidden states sparsify under hard, unfamiliar tasks Do language models sparsify their activations under difficult tasks?. As difficulty rises, the model's internal activations get much sparser in a localized, systematic way — meaning a lot of the representational machinery is idle or selectively gated at any given moment. That kind of slack is exactly what layer-pruning research exploits: if only a fraction of the network is carrying the load for a given query, you can often remove a lot of it cheaply. So the corpus offers indirect support for the intuition that knowledge retrieval might be more robust to pruning than you'd fear.

But the same library warns that 'knowledge' isn't one uniform thing spread evenly through the model. Representations are shallower for under-trained material — models do measurably worse on historical legal cases because older precedent is thinly represented in training Why do language models struggle with historical legal cases?. If depth of representation varies by how well-trodden the knowledge is, pruning won't degrade retrieval uniformly: well-worn facts may survive heavy cuts while thin, rarely-reinforced knowledge collapses first. Pruning becomes a stress test that exposes which knowledge was robustly stored versus barely held.

There's also a sharper warning from the work on disconnected internal pathways. Models can explain a concept correctly yet fail to apply it, and even recognize their own failure — a pattern showing that explanation and execution run on functionally separate circuits Can LLMs understand concepts they cannot apply?. If retrieval and application live in different parts of the network, a prune that leaves recall intact could quietly gut the ability to use what's recalled — so 'knowledge retrieval performance' alone is the wrong thing to measure. The linguistic blind-spot work makes the same point from another angle: surface pattern-matching and deep structural competence degrade differently as you push the model Why do large language models fail at complex linguistic tasks?, so a single accuracy number can hide where the damage actually landed.

The thing you might not have expected to want to know: the interesting question isn't whether pruning hurts retrieval, but what pruning reveals about your model. Because activations are sparse and knowledge is unevenly deep and split across pathways, a prune that barely dents benchmark recall can still silently break reasoning, application, or the long tail of rarely-seen facts — which is also why architectures that move knowledge *out* of the weights, into retrieval or explicit graph structure, sidestep the whole fragility problem Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?.

Sources 5 notes

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Why do language models struggle with historical legal cases?

Supreme Court overruling benchmark (236 pairs) reveals era sensitivity: models perform worse on historical cases than modern ones. Root cause is training corpus over-representation of recent cases, creating shallower representations of older precedent.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Why do large language models fail at complex linguistic tasks?

Top-tier LLMs like Llama3-70b consistently misidentify embedded clauses, verb phrases, and complex nominals. Performance degrades predictably as syntactic depth increases, revealing that statistical learning captures surface patterns but not deep grammatical rules.

Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?

Knowledge Graph of Thoughts (KGoT) achieves 29% improvement on GAIA Level 3 tasks using GPT-4o mini by externalizing reasoning into iteratively constructed KG triples. The approach improves transparency, reduces bias, and enables quality control over reasoning steps.

Can pruning half of LLM layers affect knowledge retrieval performance?

Sources 5 notes

Next inquiring lines