SYNTHESIS NOTE
Language, Text, and Discourse Psychology, Society, and Alignment Conversational AI and Personalization

Does prompt politeness change how accurate language models are?

Earlier research suggested rude prompts hurt LLM accuracy, but newer models show the opposite pattern. This raises questions about whether tone effects are real and reliable enough to guide prompting strategies.

Synthesis note · 2026-06-03 · sourced from Prompts Prompting

Prompt wording shifts LLM performance, but the role of politeness and tone has been under-studied and unstable. This short study rewrote 50 multiple-choice questions (math, science, history) into five tone variants — Very Polite, Polite, Neutral, Rude, Very Rude — yielding 250 prompts, and evaluated ChatGPT-4o with paired t-tests. Contrary to expectation, impolite prompts consistently outperformed polite ones: accuracy rose from 80.8% (Very Polite) to 84.8% (Very Rude), a statistically significant gap.

The keeper is not "be rude to your model" — the effect is small and the study preliminary — but that the direction reverses across model generations. Earlier work (Yin et al.) found very rude prompts elicited worse answers from ChatGPT-3.5 and Llama2-70B, with politeness-level effects that were non-monotonic on GPT-4. That a tonal effect can invert between model versions means pragmatic prompt features are real but not stable design levers — what helps one generation may hurt the next, so tone-based prompting advice doesn't transfer.

This extends the vault's prompt-pragmatics thread with a cautionary, social dimension. It rhymes with Can emotional phrases in prompts improve language model performance? — affective framing changes outputs — but adds the instability finding: the sign of the effect is version-dependent, raising broader questions about the social dimensions of human–AI interaction that don't reduce to a fixed prompting recipe.

Inquiring lines that use this note as a source 4

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 188 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

prompt politeness affects accuracy and the direction has flipped — on GPT-4o rude prompts outperform polite ones reversing earlier model generations