Does Gemma's transformer explicitly exploit the inherited hierarchical geometry?
This explores whether Gemma actually *uses* the nested, taxonomy-like geometry found in its representations as a working mechanism — or whether that structure is just an inherited fingerprint of the text it was trained on that the model never deliberately puts to work.
This explores whether Gemma's transformer functionally exploits its hierarchical geometry, or merely inherits it. The corpus leans hard toward the second answer: the hierarchy is a residue of training text, not a designed feature the model reaches for. The cleanest evidence is that Gemma 2B's unembeddings and word2vec embeddings share an *identical* coarse-to-fine spectral signature across WordNet taxonomies Do language models use the hierarchical geometry they inherit?. Those two models have completely different objectives and architectures, so the shared nested structure can't come from anything either one does functionally — it has to originate in the co-occurrence statistics of language itself. A companion result makes the mechanism explicit: spectral analysis of raw word co-occurrence matrices predicts and reproduces the same geometry, meaning no hierarchy-specific circuitry is required for it to appear Where does hierarchical structure in language models come from?.
So "inherited" is well supported. "Explicitly exploited" is where the corpus gets interesting, because the presence of structure and the *use* of structure are not the same thing. One striking finding is that a model can carry all the linearly decodable features a task needs while its internal organization is actually fractured and brittle — perfect accuracy masking representations that fall apart under perturbation Can models be smart without organized internal structure?. That's a direct warning against assuming geometry-you-can-read is geometry-the-model-uses. Inherited structure can sit in the weights as a statistical shadow without being load-bearing for computation.
There's a counterweight worth knowing about, though. Other work shows transformers do sometimes *recruit* geometric structure as a live mechanism. LLMs encode syntactic type and direction in a polar coordinate system — angle and distance both carry information, and using both nearly doubles probing accuracy over distance alone How do language models encode syntactic relations geometrically?. And multi-hop reasoning success correlates with entity representations clustering by cosine similarity, a geometric signature that tracks an actual capability rather than just a statistical leftover How do transformers learn to reason across multiple steps?. So geometry *can* be exploited — the question is whether the specific WordNet-style hierarchy is.
The honest synthesis: the hierarchical taxonomy geometry in Gemma looks far more like inheritance than exploitation. It emerges as a mathematical consequence of corpus statistics, shows up identically in models that share nothing functionally, and the field has explicit cautions that decodable structure ≠ used structure. This reframes a tempting assumption — that because we can find clean ontological trees inside an LLM, the model must be reasoning over them. More likely the trees are sediment left by language, and whatever reasoning happens flows through the residual stream as continuous activation rather than lookups against a stored hierarchy Do transformer models store knowledge or generate it continuously?. The thing you didn't know you wanted to know: a model can be shaped *by* a structure it never actually consults.
Sources 6 notes
Word2vec embeddings and Gemma 2B unembeddings share identical coarse-to-fine spectral signatures across WordNet taxonomies. Since these models have entirely different objectives, the shared structure must originate from training text statistics rather than convergent functional needs.
LLM hierarchical representations arise as a direct mathematical consequence of corpus statistics, not from hierarchy-specific mechanisms. Spectral analysis of word co-occurrence matrices predicts and reproduces the same nested geometry found in trained embeddings and word2vec models.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.
Controlled training reveals transformers learn multi-hop reasoning in three phases: memorization, in-distribution generalization, and cross-distribution reasoning. Successful reasoning correlates with cosine clustering of entity representations, and second-hop generalization requires explicit compositional exposure during training.
Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.