Where does hierarchical structure in language models come from?
Do LLMs build hierarchical concept geometry through dedicated mechanisms, or does it emerge naturally from word co-occurrence patterns in training data? Understanding the source matters for interpreting what representations actually reveal about model computation.
A recurring interpretability finding is that LLM representations encode hypernymy — the is-a relation between general and specific concepts — geometrically, with broad categories and their sub-categories arranged in nested, near-orthogonal structure. The tempting reading is functional: the model built a hierarchy mechanism because hierarchy is useful. This paper argues the opposite. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, it characterizes the spectrum of the embedding Gram matrix and shows that, under mild positivity and decay conditions on the co-occurrence kernel, the leading eigenvectors reproduce the taxonomy. Hierarchical concept geometry emerges from the spectral structure of pairwise word statistics; no hierarchy-specific functional mechanism is required.
The explanatory payoff is that this account is more predictive than the functional one. Rather than postulating hierarchical orthogonality from functional desiderata, it derives that the same geometry should appear outside LLMs — in plain word2vec embeddings — and should carry a specific coarse-to-fine spectral organization. Both predictions are confirmed.
Why it matters: it reframes a class of interpretability results. Geometric structure that looks like the model "knowing" a taxonomy can be a downstream shadow of corpus statistics rather than evidence of a dedicated computation. The counterpoint the authors are careful to preserve: such organization may be useful for function — but it is not driven by it. This separates "the representation has structure" from "the model uses a structured mechanism," a distinction interpretability work often blurs.
Inquiring lines that use this note as a source 13
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why are polysemantic features concentrated in early neural network layers?
- Do language models actually learn linguistic structure or just surface statistics?
- Why must world models be nested rather than flat and uniform?
- Why do leading embedding eigenvectors align with WordNet taxonomy structure?
- How do corpus statistics shape the abstraction hierarchy in language model representations?
- Can geometric structure in representations exist without supporting functional mechanisms?
- What spectral signatures distinguish hierarchy-driven geometry from corpus-driven geometry?
- Why do frequent words rank higher in taxonomic abstraction hierarchies?
- What geometric structure do language models actually use during inference?
- How does co-occurrence statistics alone produce hierarchical concept organization?
- Does Gemma's transformer explicitly exploit the inherited hierarchical geometry?
- What makes hierarchical reasoning effective for taxonomy induction?
- How do latents at the same hierarchy level become more correlated than tokens?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do embedding eigenvectors organize taxonomy from coarse to fine?
Can we predict how embeddings encode taxonomic hierarchies by examining their spectral structure? This tests whether word co-occurrence statistics alone produce the observed hierarchical geometry in language models.
the specific spectral signature that this distributional mechanism predicts and produces
-
Do standard analysis methods hide nonlinear features in neural networks?
Current representation analysis tools like PCA and linear probing may systematically miss complex nonlinear computations while over-reporting simple linear features. This raises questions about whether our interpretability methods are actually capturing what networks compute.
cautions that geometric structure detected by analysis methods need not be the computationally important structure — consonant with structure-without-mechanism
-
How do language models organize features across processing layers?
Do neural networks arrange learned features into meaningful hierarchies as they process information? Understanding this structure could reveal how models build understanding from raw tokens to abstract concepts.
contrasts a mechanism-level account of feature hierarchy with this statistics-level account of concept geometry
-
Does word frequency correlate with semantic abstraction?
Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.
another WordNet-grounded result linking corpus statistics to the abstraction structure of representations
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
- Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
- Semantic Structure in Large Language Model Embeddings
- Large Concept Models: Language Modeling in a Sentence Representation Space
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- Learn from your own latents and not from tokens: A sample-complexity theory
- Do large language models resemble humans in language use?
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
Original note title
hierarchical concept geometry in llms needs no dedicated mechanism it emerges from co-occurrence spectral structure