Can abstract preference summaries substitute for specific user interaction history?
This explores whether a compressed picture of what a user likes — a summary of their tastes — can replace keeping and searching their actual past interactions, and where each approach wins.
This explores whether a compressed picture of what a user likes — preference summaries — can stand in for the raw record of their past interactions. The strongest answer in the corpus is a qualified yes: the PRIME framework finds that semantic memory (preference summaries and parametric encodings) consistently beats episodic memory (retrieving past interactions) for personalization across models Does abstract preference knowledge outperform specific interaction recall?. The interesting wrinkle there is that when interaction recall *is* used, recency beats similarity — suggesting that what raw history mostly contributes is freshness, not depth, and abstraction captures the rest.
The form of the summary matters as much as the choice to abstract. PLUS shows that text-based preference summaries condition reward models more effectively than embedding vectors, capturing dimensions that zero-shot summaries miss — and these summaries stay interpretable and transfer to other models for zero-shot personalization Can text summaries beat embeddings for personalized reward models?. So a good substitute isn't just "less data," it's a representation that distills the right axes of taste. PReF pushes this further: with a learned set of base reward functions, roughly ten well-chosen questions can pin down a user's personalized coefficients — preference can be inferred at inference time without storing or replaying a long history at all Can user preferences be learned from just ten questions?.
But substitution has limits, and the corpus marks them. A subtle finding is that not all history is equal: user *outputs* (what they wrote or produced) drive personalization far better than their *input queries*, and output-only profiles can match full profiles — meaning the question isn't summary-vs-history so much as which slice of history you abstract from Do user outputs outperform inputs for LLM personalization?. And some structure lives only in the granular record. LLM-discovered "interest journeys" — specific, month-long pursuits like "designing hydroponic systems for small spaces" — surface from activity logs and would be flattened away by a generic taste summary Can language models discover what users actually want from activity logs?.
There's also a recurring argument that abstraction and history are complements, not rivals. Conversational recommenders that rely only on the active dialogue lose collaborative-filtering signals; recovering good user modeling means combining current intent, historical dialogues, and look-alike users Can conversational recommenders recover lost preference signals from history?. When a user's record is sparse, retrieval over reviews and aspects fills the gap that summarization alone can't Can retrieval enhancement fix explainable recommendations for sparse users?. And agent memory architectures explicitly keep both: M3-Agent separates episodic events from semantic knowledge in an entity-centric graph rather than collapsing one into the other Can agents learn preferences by watching rather than asking?.
The thing you might not have expected to learn: the debate isn't really "summary or history." It's about *what to abstract and when to keep the receipts* — abstraction wins for stable taste and cross-model transfer, raw history wins for recency, sparse cold-start users, and the long-tail specific pursuits that define a person. The best systems route between them deliberately.
Sources 8 notes
PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.
PLUS trains summarizers and reward models jointly, learning that text-based preference summaries capture dimensions zero-shot summaries miss. These summaries transfer to GPT-4 for zero-shot personalization and remain interpretable to users.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.
Research shows that user profiles built from outputs alone match or exceed performance of complete profiles across multiple tasks, while input-only profiles degrade performance. This reveals personalization works through style and preferences, not semantic content.
66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.
Current CRS systems only use the active dialogue session to infer preferences, losing item-CF and user-CF signals proven valuable in traditional recommenders. Integrating current session, historical dialogues, and look-alike users—conditioned on current intent—recovers essential user representation structure.
ERRA combines model-agnostic review retrieval with personalized aspect selection to address data sparsity that embedded methods cannot solve. Retrieval augmentation provides richer signal when user history is sparse, while aspect personalization ensures explanations match user context rather than generic defaults.
M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.