Does model confidence predict robustness to prompt changes?

Explores whether a model's certainty about its answer determines how much it resists prompt rephrasing and semantic variation. This matters because it could explain why some tasks are harder to evaluate reliably.

Synthesis note · 2026-03-28 · sourced from Prompts Prompting

ProSA (2024) provides the first systematic study of prompt sensitivity across multiple tasks and models, revealing that sensitivity is not random variation but a predictable function of model confidence.

The core finding: when a model is highly confident in its output, it is robust to prompt rephrasing, reordering, and semantic variation. When confidence is low, minor prompt changes cause significant output swings. This means prompt sensitivity is not a property of the prompt alone — it is a joint property of the prompt and the model's certainty about the underlying task.

Three moderating factors: (1) larger models exhibit enhanced robustness, consistent with the general trend that scale improves calibration; (2) few-shot examples alleviate sensitivity, providing concrete anchoring that reduces the model's reliance on prompt surface form; (3) subjective evaluations are particularly susceptible to prompt sensitivities, especially in complex reasoning-oriented tasks where the model's confidence is naturally lower.

This connects to Can models learn to ignore irrelevant prompt changes? — BCT/ACT train invariance by exposing models to perturbed prompts and requiring consistent outputs. The ProSA finding explains WHY this works: consistency training pushes models toward high-confidence response regions where robustness is natural, rather than teaching robustness as a separate skill.

The finding also has implications for Why do chain-of-thought examples fail across different conditions?: exemplar brittleness may be most severe on tasks where the model's confidence is borderline. On high-confidence tasks, exemplar ordering may matter less because the model "knows the answer" regardless.

For evaluation design: prompt sensitivity as a confidence signal means that benchmark results on single prompt formulations may be misleading exactly where they matter most — on difficult tasks where model confidence is low and prompt variation would produce the largest swings.

Inquiring lines that use this note as a source 145

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 146 in 2-hop network ·dense cluster Open in graph ↗

Does model confidence predict robustness to prom… Can models learn to ignore irrelevant prompt chang… Why do chain-of-thought examples fail across diffe… Do users worldwide trust confident AI outputs even…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can models learn to ignore irrelevant prompt changes? Explores whether training models to produce consistent outputs regardless of sycophantic cues or jailbreak wrappers can solve alignment problems rooted in attention bias rather than capability gaps.
ProSA explains why consistency training works: it pushes toward high-confidence regions where robustness is natural
Why do chain-of-thought examples fail across different conditions? Chain-of-thought exemplars show surprising sensitivity to order, complexity level, diversity, and annotator style. Understanding these brittleness dimensions could reveal what makes reasoning prompts robust or fragile.
brittleness may correlate with low confidence regions
Do users worldwide trust confident AI outputs even when wrong? Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
the flip side: high confidence creates robustness but also overreliance risk

Does model confidence predict robustness to prompt changes?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4