Why does chain-of-thought reasoning fail for personalization?

Standard reasoning traces produce logically sound but personally irrelevant answers. This explores why generic thinking doesn't anchor to user preferences and what might fix it.

Synthesis note · 2026-02-23 · sourced from Personalization

PRIME documents a two-layer failure in applying reasoning to personalization:

Layer 1: Generic CoT fails. Enabling standard chain-of-thought often underperforms the non-thinking baseline for personalization tasks. The uncustomized reasoning trace "merely scratches the surface, seeking broad answers rather than to-the-point, user-specific responses." Generic reasoning explores the problem space without being anchored to the specific user's preferences, values, or communication style — producing reasoning that is logically sound but personally irrelevant.

Layer 2: Fine-tuning destroys thinking capacity. The "fast thinking" training paradigm (direct input→output mapping) turns fine-tuned LLMs into specialist models overfitted to the target space. They lose the generalist capability of generating meaningful intermediate thoughts when prompted. A common error is token repetition — the model has been trained to shortcut directly to outputs and can no longer produce coherent intermediate reasoning. This is not a minor degradation — the model structurally cannot think anymore.

The fix: personalized self-distillation. The model generates its own personalized thinking traces (using its pre-fine-tuning generalist capability), then trains on those traces alongside the standard fine-tuning objective. This produces reasoning that is both user-specific (anchored to the individual's preferences) and deep (maintaining the capacity for intermediate thought). The self-distillation approach leverages the model's own capabilities rather than requiring external reasoning trace data.

This finding extends the reasoning/judgment split documented elsewhere. Since When does explicit reasoning actually help model performance?, personalization is a clear case of "continuous nuanced judgment" — matching preferences, style, and implicit expectations cannot be reduced to logical derivation steps. But PRIME shows the split is not absolute: personalized reasoning can help, provided the reasoning traces themselves are customized to the user.

The connection to Why does asking models to think first hurt performance? is structural: both findings demonstrate that thinking initially hurts but becomes helpful after the thinking process is adapted to the domain. In PRIME's case, self-distillation is the adaptation mechanism; in the TPO case, RL training is. The shared principle: raw thinking capability must be tuned to the domain before it adds value.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 189 in 2-hop network ·dense cluster Open in graph ↗

Why does chain-of-thought reasoning fail for per… When does explicit reasoning actually help model p… Why does asking models to think first hurt perform… Does reflection in reasoning models actually corre… Can user preferences be learned from just ten ques…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

When does explicit reasoning actually help model performance? Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
personalization as a specific instance of the judgment-degradation zone
Why does asking models to think first hurt performance? Initial prompts to generate internal thoughts degrade instruction-following performance. What reverses this harm, and can thinking become useful beyond math and logic?
parallel: thinking hurts until adapted; self-distillation and RL are distinct adaptation mechanisms
Does reflection in reasoning models actually correct errors? When reasoning models reflect on their answers, do they genuinely fix mistakes, or merely confirm what they already decided? Understanding this matters for designing better training and inference strategies.
personalized thinking is a case where reflection must be customized to add value
Can user preferences be learned from just ten questions? Explores whether adaptive question selection can efficiently infer user-specific reward coefficients without historical data or fine-tuning. This matters for scaling personalization without per-user model updates.
PReF addresses the same "generic fails, personalized succeeds" pattern at the reward level: a single reward function underperforms because it flattens individual preferences; factored rewards capture user-specific dimensions just as personalized thinking traces capture user-specific reasoning patterns

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

generic reasoning underperforms non-thinking for personalization tasks — personalized thinking via self-distillation is required because fast-thinking fine-tuning destroys generalist reasoning capability

Why does chain-of-thought reasoning fail for personalization?

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4