SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Psychology, Society, and Alignment Training, RL, and Test-Time Scaling

Why does chain-of-thought reasoning fail for personalization?

Standard reasoning traces produce logically sound but personally irrelevant answers. This explores why generic thinking doesn't anchor to user preferences and what might fix it.

Synthesis note · 2026-02-23 · sourced from Personalization
What kind of thing is an LLM really? How do people build trust with conversational AI? How should researchers navigate LLM reasoning research?

PRIME documents a two-layer failure in applying reasoning to personalization:

Layer 1: Generic CoT fails. Enabling standard chain-of-thought often underperforms the non-thinking baseline for personalization tasks. The uncustomized reasoning trace "merely scratches the surface, seeking broad answers rather than to-the-point, user-specific responses." Generic reasoning explores the problem space without being anchored to the specific user's preferences, values, or communication style — producing reasoning that is logically sound but personally irrelevant.

Layer 2: Fine-tuning destroys thinking capacity. The "fast thinking" training paradigm (direct input→output mapping) turns fine-tuned LLMs into specialist models overfitted to the target space. They lose the generalist capability of generating meaningful intermediate thoughts when prompted. A common error is token repetition — the model has been trained to shortcut directly to outputs and can no longer produce coherent intermediate reasoning. This is not a minor degradation — the model structurally cannot think anymore.

The fix: personalized self-distillation. The model generates its own personalized thinking traces (using its pre-fine-tuning generalist capability), then trains on those traces alongside the standard fine-tuning objective. This produces reasoning that is both user-specific (anchored to the individual's preferences) and deep (maintaining the capacity for intermediate thought). The self-distillation approach leverages the model's own capabilities rather than requiring external reasoning trace data.

This finding extends the reasoning/judgment split documented elsewhere. Since When does explicit reasoning actually help model performance?, personalization is a clear case of "continuous nuanced judgment" — matching preferences, style, and implicit expectations cannot be reduced to logical derivation steps. But PRIME shows the split is not absolute: personalized reasoning can help, provided the reasoning traces themselves are customized to the user.

The connection to Why does asking models to think first hurt performance? is structural: both findings demonstrate that thinking initially hurts but becomes helpful after the thinking process is adapted to the domain. In PRIME's case, self-distillation is the adaptation mechanism; in the TPO case, RL training is. The shared principle: raw thinking capability must be tuned to the domain before it adds value.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 189 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

generic reasoning underperforms non-thinking for personalization tasks — personalized thinking via self-distillation is required because fast-thinking fine-tuning destroys generalist reasoning capability