SYNTHESIS NOTE
Model Architecture and Internals

Can neural networks learn compositional skills without symbolic mechanisms?

Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.

Synthesis note · 2026-02-23 · sourced from MechInterp
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The question: do neural networks need explicit symbolic mechanisms to achieve compositionality, or does scaling suffice?

The answer: scaling data and model size leads to compositional generalization on standard MLPs, without architectural modifications — but with a critical condition: the training distribution must sufficiently cover the task space. Individual modules need not appear in isolation, but they must appear in enough combinations that the model can extract them.

Three key contributions:

  1. Proof of representational capacity. MLPs can approximate a general class of compositional task families (hyperteachers) to arbitrary precision using only a linear number of neurons relative to the number of task modules. Memorizing all tasks requires exponential capacity; the compositional solution is fundamentally more efficient.

  2. Linear decodability as a compositionality signature. When networks successfully compositionally generalize, the task constituents can be linearly decoded from hidden activations. This metric predicts failures in text-to-image models — when concepts cannot be linearly decoded, the model fails to compose them.

  3. Scaling limits. Despite progress, performance deteriorates as the number of composed concepts grows. The multiplicative nature of compositionality means even scaled models hit composition limits — the exponential growth eventually exceeds any finite training distribution.

This directly addresses Why do neural networks fail at compositional generalization?: the binding problem is solvable through scaling when training covers the task space, but remains unsolved for arbitrary novel compositions. The failure mode is not inability to learn compositional structure but insufficient exposure to the combinatorial space.

The practical implication for LLMs: compositional generalization in language (novel sentence structures, new concept combinations) should improve with scale — but the tails of the combinatorial space will always remain sparsely covered, predicting continued failures on truly novel compositions.

SKiC prompting: unlocking compositional generalization with few examples: Skills-in-Context (SKiC) prompting shows that compositional generalization can be unlocked with remarkably few examples — as few as two exemplars — when the prompt structure explicitly grounds each reasoning step on foundational skills. The SKiC prompt has three blocks: (1) skills with instructions, (2) compositional examples showing how to combine skills, (3) the problem. This one-stage approach achieves near-perfect systematic generalization and is more general than decomposition-based methods (handles complex computation graphs that cannot be linearly decomposed). Intriguingly, SKiC also unlocks "latent potential" — pre-existing internal skills from pretraining that standard prompting fails to activate. This confirms the training-coverage condition from a different angle: the model has compositional capacity from pretraining, but prompting must explicitly invoke the skill-grounding structure to surface it. Source: Prompts Prompting.

Inquiring lines that use this note as a source 33

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
19 direct connections · 143 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

compositional generalization emerges from scaling data and model size without explicit symbolic mechanisms