Why do different brain and AI systems appear similar when compared via RSA?
This explores why Representational Similarity Analysis (RSA) — which compares the *similarity structure* between representations rather than the representations themselves — so often makes brains and AI systems look alike, and whether that resemblance reflects shared mechanism or an artifact of what the method can and can't see.
This explores why RSA tends to surface brain–AI resemblance, and the honest starting point is that this corpus doesn't tackle RSA head-on — but it carries a sharp, repeated warning that bears directly on the question: matching outputs (or matching similarity structure) can coexist with radically different internal organization. The cleanest statement of this is the Fractured Entangled Representation hypothesis Can AI pass every test while understanding nothing?, which shows that networks can produce *identical responses across every input* while their internal representations are organized in completely different ways. RSA reads off the geometry of those internal responses, not the mechanism producing them — so the same lesson applies one level down: two systems can land on a similar similarity-structure without sharing how that structure is built or used. Convergence in the measurement is not convergence in the machinery.
There's a parallel caution about treating a metric's output as the underlying reality. The exploration–exploitation work Is the exploration-exploitation trade-off actually fundamental? found that a trade-off everyone assumed was fundamental basically *disappears* when you stop measuring at the token level and look at hidden-state effective rank instead — the apparent structure was an artifact of where you placed the ruler. RSA places a particular kind of ruler (pairwise distances over stimuli) on two systems and reports the correlation. The danger is symmetric: a similarity metric can manufacture a clean correspondence that a different probe wouldn't see, just as a different probe dissolved the trade-off here.
What makes brain–AI RSA matches especially seductive is that the systems genuinely *differ* in operation underneath. Transformers integrate meaning by weighted parallel aggregation over tokens, where brains seem to do selective frame-activation — suppressing irrelevant senses rather than averaging them Why do AI systems miss jokes and wordplay so consistently?. Two systems with different core operations can still produce similar response-geometries on a benchmark stimulus set, which is exactly the regime where RSA over-reports kinship. The resemblance is real at the level RSA measures and misleading at the level you care about.
The constructive thread in the corpus is that you escape this by measuring *structure of process*, not similarity of snapshots. Reasoning-fidelity research Can we measure reasoning quality beyond output plausibility? proposes traceability, counterfactual adaptability, and motif compositionality — does the representation *change the right way* when you intervene? — as tests that distinguish genuine causal organization from coherent mimicry. And hidden-state topology work Do reasoning cycles in hidden states reveal aha moments? shows there's real, mechanism-linked structure to be found (reasoning cycles that track accuracy) once you look at dynamics rather than static geometry. The takeaway you didn't know you wanted: brain–AI RSA similarity is most trustworthy as a *hypothesis generator* and least trustworthy as proof of shared computation — because the one thing this corpus keeps demonstrating is that surface and similarity-structure can both match while the underlying machinery does not.
Sources 5 notes
The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.
Hidden-state analysis using Effective Rank metrics shows near-zero correlation between exploration and exploitation, revealing the trade-off emerges only at token level. VERL demonstrates simultaneous enhancement achieving 21.4% accuracy gains on Gaokao 2024.
Transformers integrate token information through weighted parallel aggregation rather than selective suppression of irrelevant words. This structural difference explains consistent failures with jokes, wordplay, and frame-dependent meaning—not knowledge gaps, but missing cognitive operations.
Research identifies traceability, counterfactual adaptability, and motif compositionality as testable measures of human-like reasoning. These structural properties reveal whether an agent genuinely reasons causally or merely mimics coherent speech.
Distilled reasoning models show ~5 cycles per sample versus near-zero in base models, and cyclicity correlates with accuracy. These cycles in hidden-state reasoning graphs directly map to RL-trained models' documented aha moments—moments when models reconsider intermediate answers.