Can we understand LLM mechanisms with only representational analysis?

Explores whether mapping what information a model encodes is sufficient for mechanistic understanding, or whether causal verification is equally necessary to claim genuine mechanism.

Synthesis note · 2026-05-18 · sourced from Philosophy Subjectivity

The implementation-level argument in Levels of Analysis for LLMs is that representational analysis and causal analysis are partners, not alternatives. Representational analysis maps what information a model encodes — which features, circuits, attention heads carry which signals. Causal analysis tests whether the information that is encoded actually drives behavior — through interventions, ablations, activation patches. Either method alone produces an incomplete account: a representation that is encoded but causally inert is a curiosity, and a causal effect with no representational characterization is unexplained.

The synergy matters because both methods can fool you alone. Representational analysis can identify features that correlate with behavior without showing they cause it — a classic confound. Causal analysis can demonstrate that intervening on some component changes behavior without telling you what that component encodes — the lesion shows damage but not function. The combination — representational analysis locates candidates, causal analysis tests their functional role — is what produces mechanistic claims rather than descriptive ones.

This has methodological consequences for interpretability research. Studies that report only feature visualizations or only activation patches contribute, but they do not close the loop. The convergent evidence comes from pairs: locate a candidate feature representationally, then verify it causally; identify a causal component, then map its representation. The literature on attention circuits, induction heads, and feature dictionaries has been moving toward this pairing.

For LLM understanding specifically, this template explains why some claimed "mechanisms" have not held up. They were representational without causal verification (a feature that looked like task encoding but did not drive task behavior) or causal without representational characterization (an intervention that mattered but described nothing). The discipline imported from cognitive neuroscience is to demand both.

Inquiring lines that use this note as a source 64

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 110 in 2-hop network ·medium cluster Open in graph ↗

Can we understand LLM mechanisms with only repre… Can cognitive science methods unlock how LLMs actu… Can we predict where language models will fail? Can indirect psychology tests reveal what LLMs con… Do language model reasoning drafts faithfully repr…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can we understand LLM mechanisms with only representational analysis?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4