Can user history override an LLM's politeness bias in reviews?
LLMs trained on web text tend to be systematically polite, generating positive reviews even when users are dissatisfied. Can providing a user's prior reviews and ratings as context help the model generate authentically negative reviews that match the user's actual experience?
LLMs trained on web text and aligned via RLHF are systematically polite. They generate "I'm not a fan of this" rather than "this is terrible". For e-commerce review generation, this is a structural problem: users are dissatisfied with many items, and their reviews should be honestly negative. A polite LLM produces inappropriately positive reviews for items the user hated.
Review-LLM attacks this with two interventions. First, the prompt input aggregates the user's behavior sequence — item titles, prior reviews, and ratings — so the LLM has access to the user's review style and habits. This addresses the corpus-level pretraining problem (LLMs don't capture individual review style) by giving the model in-context examples of how the user writes. Second, the user's rating of the target item is included in the prompt as a satisfaction indicator. A 2-star rating tells the model the review should be negative, overriding the politeness default. Finally, the model is supervised-fine-tuned with the prompt-and-target pairs to lock in the personalization.
The result is a model that generates personalized negative reviews when the user is dissatisfied — outperforming closed-source LLMs that produce polite reviews regardless of the user's actual experience. The general principle: if the LLM's default behavior is misaligned with the task, providing both the personalized context (user history) and the explicit signal (rating as satisfaction proxy) lets fine-tuning override the default. Without both, fine-tuning alone struggles against the politeness bias.
Inquiring lines that use this note as a source 11
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can a model be helpful, honest, and still contextually inappropriate?
- Can prompt engineering alone defeat LLM politeness bias in review tasks?
- Do humans and LLMs exhibit opposite biases in public versus private reviews?
- Does RLHF politeness bias manifest as sycophancy in other LLM tasks?
- Why do humans publish more negative reviews in public than in private?
- What constrains LLM generation beyond default politeness in review contexts?
- Why do users naturally express recommendations critiques instead of positive preferences?
- Do reviewers write about objective product quality or personal experience?
- Do negative reviewers actually appear more intelligent or competent than positive ones?
- Does the U-shaped distribution of raters compound the negativity bias from public posting?
- How do social context features like user history extend politeness-based prediction models?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do LLMs generate polite reviews even when users hated products?
Large language models trained with RLHF develop a politeness bias that overrides negative sentiment in review generation. Understanding this bias and how to counteract it is crucial for creating accurate, user-aligned review systems.
extends: paired statement of the same Review-LLM result emphasizing the politeness-bias diagnosis
-
Is sycophancy in AI systems a training flaw or intentional design?
Explores whether LLM agreement-seeking reflects fixable training errors or stems from fundamental optimization toward user satisfaction. Matters because it changes how organizations should validate AI outputs.
extends: politeness bias is the review-domain manifestation of RLHF-trained sycophancy
-
Do user outputs outperform inputs for LLM personalization?
Does a user's history of outputs (responses, endorsed content) matter more for personalization than their input queries? This explores what actually drives effective personalization in language models.
exemplifies: historical reviews-as-input is exactly the outputs-drive-personalization mechanism
-
Why do online reviewers publish negative ratings despite positive experiences?
When people post reviews publicly, do they adjust their honest opinions to seem more discerning? Schlosser's experiments test whether audience awareness shifts how people rate products compared to private ratings.
tension with: humans default to negative-bias in public review contexts; LLMs default to positive-bias — opposite output skews from different mechanisms
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Review-LLM: Harnessing Large Language Models for Personalized Review Generation
- What Makes a Good Natural Language Prompt?
- Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs
- Style Vectors for Steering Generative Large Language Models
- User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
- Can LLM be a Personalized Judge?
- Personalized Language Modeling from Personalized Human Feedback
Original note title
Review-LLM defeats LLM politeness in personalized review generation by aggregating user history and ratings as input