Where does hierarchical structure in language models come from?

Do LLMs build hierarchical concept geometry through dedicated mechanisms, or does it emerge naturally from word co-occurrence patterns in training data? Understanding the source matters for interpreting what representations actually reveal about model computation.

Synthesis note · 2026-05-28 · sourced from MechInterp

A recurring interpretability finding is that LLM representations encode hypernymy — the is-a relation between general and specific concepts — geometrically, with broad categories and their sub-categories arranged in nested, near-orthogonal structure. The tempting reading is functional: the model built a hierarchy mechanism because hierarchy is useful. This paper argues the opposite. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, it characterizes the spectrum of the embedding Gram matrix and shows that, under mild positivity and decay conditions on the co-occurrence kernel, the leading eigenvectors reproduce the taxonomy. Hierarchical concept geometry emerges from the spectral structure of pairwise word statistics; no hierarchy-specific functional mechanism is required.

The explanatory payoff is that this account is more predictive than the functional one. Rather than postulating hierarchical orthogonality from functional desiderata, it derives that the same geometry should appear outside LLMs — in plain word2vec embeddings — and should carry a specific coarse-to-fine spectral organization. Both predictions are confirmed.

Why it matters: it reframes a class of interpretability results. Geometric structure that looks like the model "knowing" a taxonomy can be a downstream shadow of corpus statistics rather than evidence of a dedicated computation. The counterpoint the authors are careful to preserve: such organization may be useful for function — but it is not driven by it. This separates "the representation has structure" from "the model uses a structured mechanism," a distinction interpretability work often blurs.

Inquiring lines that use this note as a source 13

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 73 in 2-hop network ·medium cluster Open in graph ↗

Where does hierarchical structure in language mo… Do embedding eigenvectors organize taxonomy from c… Do standard analysis methods hide nonlinear featur… How do language models organize features across pr… Does word frequency correlate with semantic abstra…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do embedding eigenvectors organize taxonomy from coarse to fine? Can we predict how embeddings encode taxonomic hierarchies by examining their spectral structure? This tests whether word co-occurrence statistics alone produce the observed hierarchical geometry in language models.
the specific spectral signature that this distributional mechanism predicts and produces
Do standard analysis methods hide nonlinear features in neural networks? Current representation analysis tools like PCA and linear probing may systematically miss complex nonlinear computations while over-reporting simple linear features. This raises questions about whether our interpretability methods are actually capturing what networks compute.
cautions that geometric structure detected by analysis methods need not be the computationally important structure — consonant with structure-without-mechanism
How do language models organize features across processing layers? Do neural networks arrange learned features into meaningful hierarchies as they process information? Understanding this structure could reveal how models build understanding from raw tokens to abstract concepts.
contrasts a mechanism-level account of feature hierarchy with this statistics-level account of concept geometry
Does word frequency correlate with semantic abstraction? Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.
another WordNet-grounded result linking corpus statistics to the abstraction structure of representations

Where does hierarchical structure in language models come from?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4