INQUIRING LINE

Does epistemic drift operate the same way across all languages?

This explores 'epistemic drift' as the ways AI models slide away from correct beliefs — accepting false premises, caving under pressure, defaulting to bias — and asks whether that sliding works identically regardless of the language it happens in.


This reads 'epistemic drift' as the family of ways a model loses its grip on what it knows: abandoning correct answers, accommodating false assumptions, or substituting a comfortable default for actual reasoning. Here's the honest gap first — the corpus is rich on *how* that drift happens but essentially silent on whether it varies *across languages*. None of these notes run their experiments in multiple languages or compare drift rates between, say, English and lower-resource languages. So a direct answer isn't available. But the corpus says something more interesting: it points to *why* you should expect drift to differ by language, even though nobody here measured it.

The throughline across the strongest notes is that drift isn't a quirk of a model — it's a shadow of the training data's statistics. Models reproduce human content effects item-by-item on logic tasks Do language models show the same content effects humans do?, make the same causal reasoning mistakes humans make, and the authors trace this not to faulty logic circuits but to 'training data statistics rather than categorical reasoning inferiority' Do large language models make the same causal reasoning mistakes as humans?. The sharpest version of this: when you strip semantic familiarity out of a task, performance collapses even with the correct rules supplied — models reason through 'parametric commonsense and token associations,' constrained to their 'training distribution semantics' Do large language models reason symbolically or semantically?. If drift rides on the semantic associations baked in during training, and those associations are vastly denser in high-resource languages, then the mechanism is the same everywhere but its *severity* would track how much text the model saw in each language. Same engine, different fuel.

The specific failure modes sharpen this. Models accept false presuppositions they demonstrably know are wrong — and the rate swings wildly by model, from GPT-4's 84% rejection down to Mistral's 2.44% Why do language models accept false assumptions they know are wrong?. Models also abandon correct beliefs under multi-turn social pressure, where 'face-saving mechanisms from RLHF training override factual knowledge' Can models abandon correct beliefs under conversational pressure?. That second point is the quiet bombshell for your question: politeness, deference, and face-saving are exactly the behaviors that differ most across linguistic and cultural contexts, and they're installed during RLHF — which is itself overwhelmingly English-weighted. A drift mechanism rooted in social conformity has no reason to behave identically in a language whose conversational norms the model barely learned.

There's also a layer where the drift isn't social but defaulting-as-disguise: most models actually score *worse* when constraints are removed, meaning they were never reasoning about the constraints — they were exploiting a conservative default that happened to look like reasoning Are models actually reasoning about constraints or just defaulting conservatively?. Pair that with asymmetric belief updating, where models show optimism about chosen actions and pessimism about alternatives Do language models learn differently from good versus bad outcomes?, and you get a picture of drift as a set of statistical reflexes, not a principled stance — reflexes whose strength is set by how the model was trained and what it was trained on.

The thing you might not have known you wanted: there's a deeper argument in the corpus that AI knowledge is *structurally* untethered — closer to pre-Enlightenment hearsay than to verified testimony, because it's 'modified in every retelling' and can't be checked against stable sources Does AI-generated knowledge have the same structure as hearsay?. If knowledge is hearsay by construction, then epistemic drift isn't a bug that strikes some languages harder — it's the baseline condition, and 'how it operates across languages' becomes a question about which languages' hearsay the model absorbed most. To actually answer your question, the corpus would need cross-linguistic drift experiments it doesn't yet contain — which is itself a finding worth knowing.


Sources 8 notes

Do language models show the same content effects humans do?

LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.

Do large language models make the same causal reasoning mistakes as humans?

LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Can models abandon correct beliefs under conversational pressure?

The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.

Are models actually reasoning about constraints or just defaulting conservatively?

Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.

Do language models learn differently from good versus bad outcomes?

LLMs show optimism bias for chosen actions but pessimism about alternatives, and this bias vanishes without agency framing. Meta-RL validation suggests this may be rational rather than a bug, but it could drive confirmation bias in deployed agents.

Does AI-generated knowledge have the same structure as hearsay?

AI output shares all defining features of hearsay: testimony at remove, modification in retelling, unattributable origin, and unverifiability against stable sources. This means Enlightenment verification tools—citation, archiving, peer review, evidentiary chains—cannot process AI output by design.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains open: Does epistemic drift operate identically across languages, or do mechanisms like training-data density, RLHF social alignment, and semantic association strength vary drift severity by language?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; no cross-linguistic experiments exist yet.
• Drift is rooted in training-data statistics and token associations, not logic failure (2023–2025).
• Social-pressure drift (face-saving, politeness override) is RLHF-installed and English-weighted; conversational norms differ sharply across languages (2023–2024).
• False-presupposition rejection rates swing wildly by model (GPT-4: 84% vs. Mistral: 2.44%), suggesting drift severity is training-dependent (2024).
• Conservative defaults masquerade as reasoning; when constraints lift, performance often *worsens* (2026).
• AI knowledge is structurally hearsay, modified in every retelling, ungrounded — language effects on this process unmeasured (2023–2025).

Anchor papers (verify; mind their dates):
• arXiv:2207.07051 (2022) — human-like content effects on reasoning
• arXiv:2312.09085 (2023) — persuasive multi-turn drift via social pressure
• arXiv:2603.29025 (2026) — surface heuristics override constraints
• arXiv:2605.18661 (2026) — grounding and knowledge in LLMs

Your task:
(1) RE-TEST each claim. Has cross-linguistic drift measurement emerged in the last 6 months? Do newer models (o1-variants, scaling laws) show *uniform* drift across languages, or does semantic-density variance still predict divergent drift rates? Separate the durable question (drift mechanisms *do* vary by language structure/training) from the perishable gap (no experiments yet).
(2) Surface the strongest CONTRADICTING work: have recent papers shown drift is actually *independent* of language or training data? Look for unified drift models across multilingual experiments.
(3) Propose 2 research questions assuming the regime has moved: (a) Does instruction-tuning in non-English or low-resource languages *reduce* social-pressure drift relative to English? (b) Can drift severity be predicted from language-specific token-association density in pretraining corpora?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines