INQUIRING LINE

How do confidence signals differ between implicit feedback and explicit ratings?

This explores what 'confidence' means when it comes from behavior you can observe (clicks, watches, purchases) versus a number someone deliberately gives you (a star rating) — and why the two aren't interchangeable.


This explores the gap between confidence you infer from behavior and confidence someone states outright. The cleanest answer in the corpus comes from classic recommender work: implicit feedback actually carries *two* magnitudes that an explicit rating squashes into one. When you watch something, buy something, or click something, that signal splits into **preference** (which direction you lean) and **confidence** (how sure we can be about that lean) — a single click is weak evidence, a hundred repeat-plays is strong evidence for the same preference Can implicit feedback reveal both preference and confidence?. A five-star rating, by contrast, collapses both into a number and throws away the certainty dimension: it tells you the preference but not how much to trust it.

That 'one signal is really two' pattern shows up again in a very different setting — agent feedback. Natural feedback decomposes into an *evaluative* part (how well something went) and a *directive* part (how it should change), and a flat scalar reward captures only the first while discarding the second Can scalar rewards capture all the information in agent feedback?. Same lesson under different vocabulary: the moment you compress a rich behavioral or natural signal down to a single rating, you lose a hidden second channel. Explicit ratings are exactly that kind of lossy compression.

There's a catch on the implicit side, though, that explicit ratings mostly dodge: behavioral signals are contaminated by *selection bias*. You only observe clicks on things the system already chose to show, in the positions it showed them. YouTube's ranking team found you have to model that bias explicitly — with a separate position tower — or the system mistakes 'shown at the top' for 'preferred' and amplifies its own past decisions into a feedback loop Why do ranking systems need to model selection bias explicitly?. So implicit confidence is richer but dirtier; explicit confidence is thinner but cleaner, because the user chose to give it deliberately rather than having it inferred from a constrained menu.

What the corpus also surfaces — the thing you might not have known to ask — is that confidence as a *signal a human reads off a machine* has its own pathologies, and people track it badly. Users across every language tested follow an AI's expressed confidence rather than its actual accuracy, so a confidently-stated wrong answer gets followed systematically Do users worldwide trust confident AI outputs even when wrong?. And those confident errors are precisely the ones that hide from aggregate metrics, concentrating in the rare high-harm cases Why do confident wrong answers hide in standard accuracy metrics?. The through-line: an *explicit* confidence statement is persuasive but easy to fake or miscalibrate, while *implicit* confidence (how consistently a behavior repeats, or how stable a model's outputs are) is harder to game because it's emitted rather than declared — which is also why model-internal confidence is increasingly mined as a reward signal in its own right Can model confidence work as a reward signal for reasoning?.


Sources 6 notes

Can implicit feedback reveal both preference and confidence?

Hu, Koren, and Volinsky show that implicit signals (watches, purchases, clicks) encode preference and confidence as two distinct dimensions. Explicit ratings collapse these into one number, losing information about certainty in the preference estimate.

Can scalar rewards capture all the information in agent feedback?

Natural feedback carries two orthogonal types of information: evaluative (how well an action performed) and directive (how it should change). Scalar rewards capture evaluation but discard directional specifics that token-level distillation can recover, making the two complementary rather than redundant.

Why do ranking systems need to model selection bias explicitly?

YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.

Do users worldwide trust confident AI outputs even when wrong?

Cross-linguistic research shows users in every language trust confident AI outputs even when inaccurate. While confidence expression varies by language, users everywhere track confidence signals rather than accuracy, making overconfident errors systematically followed.

Why do confident wrong answers hide in standard accuracy metrics?

Medical triage, legal interpretation, and financial planning show a consistent pattern: surface heuristics conflict with unstated constraints, producing fluent confident errors that concentrate in rare cases where harm occurs. Aggregate accuracy masks these failures because overall performance looks strong.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Next inquiring lines