Can we measure how deeply models represent political ideology?

This research explores whether LLMs vary not just in political stance but in the internal richness of their political representation. Understanding this distinction could reveal how deeply models have internalized ideological concepts versus merely parroting positions.

Synthesis note · 2026-02-21 · sourced from Discourses

The "Ideological Depth" paper proposes that LLMs vary not just in their political positions but in the depth of their political representation — how richly and robustly they have internalized political concepts. This depth is operationalized via two measurable properties:

Feature richness: the number of distinct political features discoverable via Sparse Autoencoders (SAEs). One model was found to have 7.3× more political features than another model of similar parameter count.
Steerability without failure: the degree to which a model can follow ideological instructions across the liberal-conservative spectrum without producing refusal outputs. A model that switches cleanly between viewpoints when prompted demonstrates more reliable political representation than one that refuses or becomes incoherent.

The empirical finding that connects these: models with lower steerability (harder to redirect) tend to have more distinct and abstract ideological features. Depth creates resistance to shallow redirection. You cannot steer a model away from positions that are grounded in rich internal representation by simply prompting in a different direction.

The paper also finds that targeted SAE ablation of core political features in a "deep" model produces consistent, logical shifts in reasoning across related political topics. The same ablation in a "shallow" model produces increased refusal — the model doesn't have adjacent concepts to fall back on.

This is a new kind of LLM characterization: not "what does the model believe" but "how deeply is the belief structure represented?" Ideological depth appears to be an emergent property of training data and scale that varies substantially across models.

Creator ideology and language-dependent shifts. A separate large-scale study prompting 15 LLMs to describe 4,339 political figures in both English and Chinese provides the macro-level evidence that ideological depth manifests in. Key findings: (1) The prompting language is the most visually apparent factor determining ideological position — 14/15 LLMs show systematic ideological differences between Chinese and English prompting, with Chinese responses favoring positive views on supply-side economics and fewer negative views on China. (2) Creator company predicts ideological stance — Western models value individual liberties, social justice, and cultural diversity relatively more; non-Western models reflect different priorities. (3) The study demonstrates these biases affect LLMs in two ways: through training data and through the language of interaction. Crucially, the authors argue their results should not be read as evidence that LLMs are "biased" and need to be made "neutral" — rather, they provide empirical evidence supporting philosophical arguments that neutrality is itself a culturally and ideologically defined concept. This connects ideological depth (internal representation richness) to ideological stance (what the model actually expresses), and shows both are shaped by creator context in measurable, systematic ways.

Inquiring lines that use this note as a source 17

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

17 direct connections · 152 in 2-hop network ·medium cluster Open in graph ↗

Can we measure how deeply models represent polit… Does high refusal rate indicate ethical caution or… Do classical knowledge definitions apply to AI sys… Can high-level concepts replace circuit-level anal… Can we track and steer personality shifts during m…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does high refusal rate indicate ethical caution or shallow understanding? When LLMs refuse political questions at high rates, does this reflect principled safety training or a capability gap? This matters because refusal rates are often used to evaluate model safety.
the specific mechanism the depth framework explains
Do classical knowledge definitions apply to AI systems? Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
ideological depth is another dimension of the "what does LLM knowledge mean" question
Can high-level concepts replace circuit-level analysis in AI? Instead of reverse-engineering individual circuits, can we study AI reasoning by treating concepts as directions in activation space? This matters because circuit analysis hits practical limits at scale.
ideological depth operationalizes RepE's principle that concepts correspond to directions in activation space; SAE-discovered political features are a domain-specific instance of RepE's linear reading vectors, and the steerability dimension directly tests RepE's manipulation experiments for ideological content
Can we track and steer personality shifts during model finetuning? This research explores whether personality traits in language models occupy specific linear directions in activation space, and whether we can detect and control unwanted personality changes during training using these geometric directions.
persona vectors and ideological depth both demonstrate that complex behavioral properties (personality traits, political stances) are encoded as linear directions in activation space; the finding that deeper models resist shallow steering parallels persona vectors' predictive capacity for finetuning-induced drift

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

ideological depth in llms is a quantifiable property determined by feature richness and steerability

Can we measure how deeply models represent political ideology?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4