How much user interaction data is needed for effective AI personalization?

This reads the question as 'how much' but the corpus mostly answers a sharper one: effective personalization depends far less on the volume of interaction data than on which signals you keep and how you compress them.

This explores how much user interaction data AI personalization actually requires — and the collection's most useful move is to reframe the question, because several lines of work suggest the answer is 'surprisingly little, if you use the right data.' The most direct datapoint: a reward-factorization approach can pin down a user's personalized preferences from about ten well-chosen adaptive questions, by learning shared base reward functions first and then asking only the questions that most reduce uncertainty Can user preferences be learned from just ten questions?. The lever there isn't quantity, it's informativeness — active questioning beats passive accumulation.

A second thread argues that even pre-collected data may be optional. A curiosity-reward method personalizes in real time by rewarding the agent for reducing its uncertainty about who it's talking to mid-conversation, so the interaction itself becomes the data source — no profile required up front Can conversations themselves personalize without user profiles?. Persona-based systems push the same idea: a structured persona can be refined at test time by simulating recent interactions against feedback, turning a handful of recent exchanges into a working model of the user Can personas evolve in real time to match what users actually want?.

Where the corpus gets genuinely counterintuitive is on what kind of data earns its keep. Abstracted preference summaries consistently outperform retrieving piles of specific past interactions — semantic memory beats episodic recall, which means compressing history into 'what this person tends to prefer' is more valuable than hoarding the raw log Does abstract preference knowledge outperform specific interaction recall?. Relatedly, profiles built only from a user's past outputs match or beat full profiles, while input-only profiles actually hurt — personalization runs on style and preference, not on the semantic content of every query Do user outputs outperform inputs for LLM personalization?. So 'more data' can be worse than a smaller, better-curated slice.

There's also a quality-of-signal dimension the volume framing misses entirely. Behavioral cues like gaze, hesitation, and typing speed can be read as a continuous signal of cognitive state, meaning a thin but rich real-time stream may carry more personalization value than a thick archive of clicks Can AI systems read cognitive state from interaction patterns alone?. And at the discovery end, language models can mine activity logs to surface month-long 'interest journeys' that collaborative filtering misses — extracting durable intent from existing data rather than demanding new data Can language models discover what users actually want from activity logs?.

The thing you didn't know you wanted to know: across these papers the binding constraint on personalization isn't data scarcity, it's data selection and abstraction. Ten targeted questions, a user's outputs alone, a compressed preference summary, or even the live texture of a single conversation can outperform exhaustive logging — which flips the usual 'collect everything' instinct on its head, and incidentally lightens the privacy footprint at the same time.

Sources 7 notes

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can conversations themselves personalize without user profiles?

Adding an intrinsic motivation reward for reducing uncertainty about user type during conversation enables personalization without pre-collected profiles. Tested in education and fitness domains with 20 user attributes, the approach balances helpfulness with strategic information gathering.

Can personas evolve in real time to match what users actually want?

PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Do user outputs outperform inputs for LLM personalization?

Research shows that user profiles built from outputs alone match or exceed performance of complete profiles across multiple tasks, while input-only profiles degrade performance. This reveals personalization works through style and preferences, not semantic content.

Can AI systems read cognitive state from interaction patterns alone?

Research shows AI systems can instrument multimodal behavioral signals (gaze, hesitation, speed) to read cognitive state during interaction, preserving flow by avoiding disruptive explicit probes. However, the same substrate enables both helpful timing and manipulative profiling.

Can language models discover what users actually want from activity logs?

66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tracking the actual constraints on LLM personalization. A curated library (2023–2026) found that effective personalization requires surprisingly *little* interaction data—if chosen wisely. Your task: test whether these findings still hold, or whether newer capabilities have shifted the regime.

What a curated library found — and when (dated claims, not current truth):
• Reward factorization can lock down user preferences from ~10 adaptive questions by learning shared base functions first, then asking only uncertainty-reducing questions (2025).
• Semantic preference summaries consistently outperform raw episodic logs; users' *outputs* alone drive personalization better than their inputs (2024–2025).
• Real-time curiosity-reward methods personalize mid-conversation without pre-collected profiles; the interaction *becomes* the data source (2025).
• Multimodal behavioral cues (gaze, typing speed, hesitation) carry more personalization signal than thick click archives; real-time texture beats volume (2025).
• LLMs can extract month-long 'interest journeys' from activity logs, surfacing intent that collaborative filtering misses (2023).

Anchor papers (verify; mind their dates):
• arXiv:2503.06358 (2025-03) – Reward Factorization
• arXiv:2406.17803 (2024-06) – Understanding User Profile Role
• arXiv:2504.03206 (2025-04) – Curiosity Reward Personalization
• arXiv:2506.06254 (2025-06) – PersonaAgent Test-Time Adaptation

Your task:
(1) RE-TEST EACH CONSTRAINT. For the ten-question finding, the semantic-vs-episodic trade-off, and the real-time curiosity method: has anything since 2026-02 (the library's edge) relaxed these limits through new model scaling, in-context learning, or retrieval harnesses? Where do constraints still appear tight?
(2) Surface work from the last 6 months (if this is post-2026-08) that *contradicts* the 'small data is enough' thesis—e.g., cases where scale *does* improve personalization, or where multimodal signals *don't* compress as claimed.
(3) Propose 2 research questions that assume the regime *has* shifted: (a) Does scaling context windows allow passive logging to compete with active questioning? (b) Can end-to-end learned personas replace hand-tuned reward functions?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How much user interaction data is needed for effective AI personalization?

Sources 7 notes

Next inquiring lines