SYNTHESIS NOTE
Model Architecture and Internals Language, Text, and Discourse Reasoning, Retrieval, and Evaluation

Can models be smart without organized internal structure?

Explores whether linear feature decodability proves genuine compositional reasoning or merely indicates that the right features are present but poorly organized. Critical for understanding what performance metrics actually certify.

Synthesis note · 2026-02-23 · sourced from MechInterp
What actually happens inside the minds of language models?

Two findings from mechanistic interpretability appear contradictory but operate at different levels of representational analysis:

Fractured Entangled Representations (FER): Since Can identical outputs hide broken internal representations?, SGD-trained models fail catastrophically under perturbation or distribution shift in ways that well-organized representations would not. The pathology is invisible to standard evaluation.

Compositional generalization at scale: Scaling data and model size produces representations where compositional features are linearly decodable — separable task constituents can be independently identified and manipulated. This has been taken as evidence for genuine compositional understanding.

The resolution: Linear decodability tests for the presence of features, not their organization. A fractured representation could contain every linearly decodable feature while being fractured in how those features relate to each other. The compositional parts are present but their composition is broken.

This connects directly to the "imposter intelligence" post angle: Can LLMs understand concepts they cannot apply?, Does supervised fine-tuning actually improve reasoning quality?, and Do foundation models learn world models or task-specific shortcuts?. All describe the same meta-pattern: surface metrics certify capability that internal structure analysis would disqualify.

The practical implication for model evaluation: passing compositional generalization tests does not guarantee robust compositional reasoning. Evaluation under distribution shift, perturbation, and novel recombination is required to distinguish genuine compositionality from fractured representations that happen to contain the right features.

Inquiring lines that use this note as a source 145

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 1

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 126 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

identical performance metrics can mask fundamentally different internal representations — feature linear decodability does not guarantee representational organization