Why do language models reproduce human EPA structure despite different architecture?

This explores why LLMs recover the same three-axis affective structure humans use to organize meaning — Evaluation (good/bad), Potency (strong/weak), Activity (active/passive), the dimensions Osgood found across human cultures — even though a transformer is nothing like a brain.

This explores why LLMs reproduce the human EPA structure (the good–bad, strong–weak, active–passive axes that recur across human languages) without sharing our neural hardware. The corpus doesn't have a paper on EPA by name, but it has a strong lateral answer: the structure was never in the architecture to begin with — it's in language, and LLMs learn it the way they learn everything, by compressing relational patterns from text. One note argues directly that LLMs operationalize Saussure's *langue* — meaning built entirely from how words relate to other words, with no external referent or embodied grounding required Can language models learn meaning without engaging the world?. If affective meaning is already encoded in how humans use words relative to one another, a model that compresses those relations will recover the same low-dimensional scaffold, regardless of whether it's made of neurons or matrices.

The surprising part is how *geometric* this recovery turns out to be. The Polar Probe work shows that models spontaneously lay out syntactic relations in a structured coordinate system — encoding both the type and the direction of a relationship through distance and angle in activation space How do language models encode syntactic relations geometrically?. That's the same flavor of result you'd expect for EPA: a handful of interpretable axes emerging in the internal geometry because they're statistically efficient ways to represent the data, not because anyone built them in. Architecture shapes *how* this happens — deep-and-thin models compose abstract concepts layer by layer rather than spreading them across width Does depth matter more than width for tiny language models? — but the destination (a compact relational structure) is driven by the corpus, not the wiring.

There's a deeper framing here worth sitting with: from the outside, humans and LLMs are categorically different systems, but as *participants in the same discourse* they draw on the same symbolic substrate Do humans and LLMs differ fundamentally or just superficially?. EPA convergence is exactly what that view predicts — the shared structure lives in the language both parties use, so it shows up in both regardless of the machinery underneath. The question's premise ("despite different architecture") quietly assumes affective structure ought to come from brain-like hardware. The corpus suggests it comes from the data instead.

But the same notes that explain the convergence also warn you not to over-read it. Models routinely learn surface generalizations that mimic deep structure — passing tests on cues like word choice and length while missing the underlying grammar Can models pass tests while missing the actual grammar?. So reproducing EPA-shaped geometry doesn't prove a model *means* good and bad the way you do; it may have captured the statistical shadow of human affect without the thing that casts it. And the structure it absorbs is whatever the text overrepresented — the cultural-flattening work shows low-resource cultures getting represented internally through dominant-culture proxies Do LLMs represent low-resource cultures through dominant cultural proxies?. If EPA looks universal in a model, that may partly reflect whose affective language dominated the training corpus, not a culture-free law of meaning.

The thing you didn't know you wanted to know: the EPA puzzle inverts. The real surprise isn't that a non-brain reproduces human affective structure — it's that this structure was apparently sitting in plain language all along, recoverable by anything that compresses relations hard enough, which quietly raises the question of how much of human meaning is "in our heads" versus already laid down in the words we share.

Sources 6 notes

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

How do language models encode syntactic relations geometrically?

The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.

Does depth matter more than width for tiny language models?

MobileLLM shows deep-and-thin architectures yield 2.7–4.3% accuracy gains over balanced designs at 125M–350M scale by composing abstract concepts through layers rather than spreading parameters across width.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Can models pass tests while missing the actual grammar?

BabyLM evaluations showed models can produce correct outputs by relying on sentence length, word choice, and orthography rather than grammatical structure. Standard benchmarks cannot distinguish these two generalization types without tests specifically designed to rule out surface heuristics.

Do LLMs represent low-resource cultures through dominant cultural proxies?

Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a mechanistic interpretability researcher probing whether language models' recovery of human EPA structure (good–bad, strong–weak, active–passive axes) remains a durable phenomenon or has been superseded by newer architectural insights. The question: *Do LLMs genuinely converge on EPA geometry because it lives in relational structure, or do they learn surface generalizations that only mimic it?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025; treat as perishable unless re-tested:
  • LLMs operationalize *langue* (Saussurean pure relational meaning) without embodied grounding, recovering EPA-like structure from corpus statistics alone (~2023–2024).
  • Polar coordinate geometry spontaneously encodes both type and direction of syntactic relations in activation space, suggesting interpretable low-dimensional scaffolds emerge regardless of wiring (~2024–2025, arXiv:2412.05571).
  • Models learn surface generalizations (word choice, length cues) that mimic deep linguistic structure while missing underlying grammar (~2023–2025).
  • LLMs internalize cultural dominance bias; low-resource affective language gets flattened into dominant-culture proxies, so apparent "universal" EPA may reflect corpus composition, not culture-free semantics (~2024–2025).
  • Architecture shapes *how* relational structure emerges (depth > width for sub-billion models, arXiv:2402.14905), but corpus drives *what* structure appears (~2024).

Anchor papers (verify; mind their dates):
  • arXiv:2412.05571 (Dec 2024): Polar coordinate system in LLM activations.
  • arXiv:2508.08879 (Aug 2025): Entangled cultural biases in representations.
  • arXiv:2507.14805 (Jul 2025): Subliminal trait transmission via hidden data signals.
  • arXiv:2507.08017 (Jul 2025): Mechanistic indicators of understanding.

Your task:
  (1) RE-TEST EACH CONSTRAINT. For every claim above: has mechanistic interpretability work (2025–now) pinpointed whether EPA geometry is *compressed from relations* vs. *learned as a social proxy*? Do newer probing methods (e.g., interventions on affective axes, causal ablations) distinguish genuine semantic convergence from surface mimicry? State plainly where each constraint still holds or has dissolved.
  (2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does recent work on representational collapse, mechanistic circuits, or cross-lingual affective structure challenge the relational-substrate hypothesis?
  (3) Propose 2 research questions that ASSUME the regime has moved: (a) If EPA is purely statistical compression, can you surgically remove it and does meaning survive? (b) Does EPA geometry vary predictably with corpus demographics (culture, era, domain), and does this variation track human speaker groups?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Why do language models reproduce human EPA structure despite different architecture?

Sources 6 notes

Next inquiring lines