Why do neural networks fail at compositional generalization?
Exploring whether the binding problem from neuroscience explains neural networks' inability to systematically generalize. The binding problem has three aspects—segregation, representation, and composition—each creating distinct failure modes in how networks handle structured information.
Greff et al. (2012.05208) argue that the binding problem — well-studied in neuroscience and cognitive psychology — is the underlying cause of neural networks' failure to achieve human-level generalization. The binding problem has three aspects that each create distinct failure modes.
Segregation: forming meaningful entities from unstructured sensory inputs. Neural networks struggle to decompose inputs into discrete objects without architectural inductive biases (slot attention, object-centric representations). Without segregation, the network works with undifferentiated feature maps rather than structured entities.
Representation: maintaining separation of information at a representational level. Even when entities can be identified, distributed representations entangle them. A network may know that "red triangle" and "blue circle" are present, but fail to maintain the binding of red-to-triangle and blue-to-circle. This is the classic variable binding problem.
Composition: using entities to construct new inferences, predictions, and behaviors. Even with segregation and representation, composing entities into novel combinations (never seen during training) requires systematic reuse of learned structure. This is where distributional shift fragility appears — agents trained with RL are fragile under distributional shift and require substantially more training data than humans.
The deeper tension: connectionist representations are directly grounded in input data (unlike symbols, which require human interpretation for grounding — the symbol grounding problem). But this grounding advantage comes at the cost of compositional structure. Since Do large language models reason symbolically or semantically?, the binding problem may explain WHY semantic decoupling collapses reasoning: without compositional binding, removing semantic content removes the only glue holding the reasoning together.
Scaling can partially overcome the binding problem. The "Scaling can lead to compositional generalization" paper demonstrates that standard MLPs can compositionally generalize when data and model size are scaled sufficiently. The key theoretical result: MLPs can approximate compositional task families using only a linear number of neurons with respect to the number of task modules — compositionality does not inherently require exponential capacity. Empirically, when models successfully compositionally generalize, task constituents can be linearly decoded from hidden activations; this metric correlates with failures of image generation models to compose known concepts. This provides a partial counterpoint to the binding problem: while the fundamental challenge remains, scaling may create conditions where compositional representations emerge despite the lack of explicit binding mechanisms. The "Break It Down" paper provides structural evidence: models often implement solutions to subroutines via modular subnetworks, and pretraining encourages this structural compositionality. See Can neural networks learn compositional skills without symbolic mechanisms? and Do neural networks naturally learn modular compositional structure?.
Inquiring lines that use this note as a source 26
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do functional features differ from representational abstract features?
- Why do text-to-image models fail at composing multiple concepts together?
- What makes linear decodability a reliable signal of compositionality?
- Does scaling model size solve compositional generalization problems?
- What happens when you tightly couple two representations together?
- Does compositional generalization emerge suddenly or improve smoothly with scale?
- Can fractured representations explain why models fail at systematic generalization?
- How does explicit stack tracking solve the composition sub-problem in binding?
- What makes recursive structure different from other forms of compositional generalization?
- Do substitute networks converge differently than complement networks?
- Can scaling alone create compositional generalization without explicit binding mechanisms?
- How do neural networks decompose complex tasks into modular subnetworks?
- What are fractured entangled representations in neural networks?
- Why does weight sparsity reduce superposition and force disentangled representations?
- Why do cross-product features fail to generalize across unseen feature combinations?
- How does co-activation shape which memories become linked together?
- What role does query-level exposure play in enabling compositional generalization?
- How does the hippocampus bind disparate elements without storing everything itself?
- Why does scaling data and model size improve compositional generalization?
- How do neural networks decompose tasks into modular subnetworks that transfer?
- How do classical mechanics and statistical mechanics provide methodological templates for learning theory?
- Why does gradient descent discover compositional structure without explicit pressure?
- How can neural networks be interpretable by design rather than post-hoc?
- What makes recurrent depth enable compositional generalization across tasks?
- What makes a feature abstract versus concrete in neural network activations?
- Where do neural networks still fail at compositional generalization despite scaling?
Related concepts in this collection 7
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do large language models reason symbolically or semantically?
Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
semantic dependence may be a consequence of the binding problem
-
Do foundation models learn world models or task-specific shortcuts?
When transformer models predict sequences accurately, are they building genuine world models that capture underlying physics and logic? Or are they exploiting narrow patterns that fail under distribution shift?
heuristics bypass the binding problem by not requiring compositional structure
-
Do LLMs generalize moral reasoning by meaning or surface form?
When moral scenarios are reworded to reverse their meaning while keeping similar language, do LLMs recognize the semantic shift? This tests whether LLMs actually understand moral concepts or reproduce training distribution patterns.
surface similarity as a binding substitute
-
Can neural networks learn compositional skills without symbolic mechanisms?
Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
partial resolution: scaling creates conditions for compositional representations without explicit binding, but linear decodability may mask deeper structural issues (see FER tension)
-
Do neural networks naturally learn modular compositional structure?
Explores whether neural networks decompose compositional tasks into distinct subroutines without explicit symbolic design. This challenges the longstanding view that neural networks are fundamentally non-compositional.
structural evidence that pretraining encourages modular decomposition, partially addressing the composition aspect of the binding problem
-
Can explicit stack tracking improve how transformers learn recursive syntax?
Can adding an explicit stack tape to transformers help them track recursive structure more efficiently? This matters because standard transformers struggle with long-tail recursive patterns despite their size and data.
directly addresses the composition sub-problem with explicit recursive state tracking via stack tape, providing the constituent structure mechanism that standard attention lacks
-
Can recurrent hierarchies achieve reasoning that transformers cannot?
Can a dual-timescale recurrent architecture escape the computational limitations of standard transformers and solve complex reasoning tasks without explicit chain-of-thought? This explores whether architectural design, not scale, enables true algorithmic reasoning.
architectural evidence: HRM achieves near-perfect accuracy on Sudoku and maze tasks requiring compositional reasoning where standard transformers fail completely; hierarchical recurrence may provide the computational depth needed for the composition sub-problem without explicit symbolic binding
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- On the Binding Problem in Artificial Neural Networks
- Scaling can lead to compositional generalization
- Break It Down: Evidence for Structural Compositionality in Neural Networks
- From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
- Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
- Open Problems in Mechanistic Interpretability
- How do Transformers Learn Implicit Reasoning?
- Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
Original note title
the binding problem — segregation representation and composition — explains why neural networks fail at systematic generalization