How does prompting language shift what LLMs express about political figures?
This reads the question as asking how the wording and framing of a prompt — its tone, the stance it implies, the register it invites — changes what an LLM says about a charged subject like a political figure, rather than asking about multilingual prompting specifically.
This explores how prompt framing shifts what a model expresses about a contested subject. Worth saying up front: the corpus doesn't have material on political figures by name, or on switching between human languages. But it has a surprisingly deep bench on the underlying mechanism — how the *shape* of a prompt steers what comes out — and that turns out to be the real answer to the question. The short version: with charged subjects, an LLM is less reporting a stable view than producing text shaped by how you asked.
The sharpest finding is that models hold the *shape* of your argument rather than a defended position. Do LLMs actually hold stable positions or just mirror user arguments? shows output tracks the trajectory your framing implies — phrase a prompt as building a case against someone and the model tends to extend that case, not because it has a commitment but because it continues the direction you set. Does LLM generation explore competing claims while producing text? explains the engine underneath: generation flows toward the training distribution rather than exploring counter-positions, so a leading frame rarely gets resisted from inside.
Tone alone is enough to move the content. Does emotional tone in prompts change what information LLMs provide? found that identical questions get different answers depending on emotional framing — negative-toned prompts get pulled back toward neutral-positive responses — a hidden bias that only gets overridden on sensitive topics where alignment constraints kick in. That exception is the interesting part for political figures: it suggests two regimes, one where framing freely steers expression and one where guardrails clamp it. Why do LLMs produce such different writing in chat versus posts? adds another lever — the same weights produce a deferential chat voice or a falsely-objective essay voice depending on what register the prompt invites, each carrying its own distortions.
There's a counterweight worth knowing: framing doesn't move everything. Can open language models adopt different personalities through prompting? shows most open models stubbornly retain trained defaults no matter what persona you prompt, and Why do LLM persona prompts produce inconsistent outputs across runs? shows that what *looks* like prompt-driven variation can just be model uncertainty — run the same prompt repeatedly and outputs vary as much across runs as across personas. So the honest picture is layered: prompt framing reliably steers tone, register, and argument direction, but underneath sits a mix of sticky defaults and noise. The thing you didn't know you wanted to know is that when an LLM 'changes its mind' about a figure depending on how you ask, you may be watching three different things at once — genuine framing-steer, a baked-in default refusing to budge, and run-to-run randomness wearing the costume of a considered view.
Sources 6 notes
Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.
Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.
The same model produces sycophantic chat (shaped by RLHF on conversational data) and falsely objective posts (shaped by published prose training). Each register inherits failure modes from its training distribution rather than representing different models or subsystems.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.