How should aspect selection adapt across different item categories and users?
This explores how a recommender picks *which* features of an item to explain or weight — and how that choice should shift depending on the item type (a movie vs. a laptop) and on who's asking, especially when a user has little history.
This explores how a recommender picks *which* aspects of an item to surface — and how that choice should bend across item categories and across users. The corpus points to one clear answer: aspect selection should never be a fixed, generic list. It should be conditioned on both the candidate item in front of the user and the user's own context, and it should fall back on borrowed signal when the user is data-sparse.
The most direct treatment is ERRA Can retrieval enhancement fix explainable recommendations for sparse users?, which pairs *personalized* aspect selection with retrieval of other users' reviews. Its key move is recognizing that the problem is hardest exactly when a user has thin history — so it pulls richer aspect signal from a retrieved corpus rather than defaulting to generic features, while still tuning which aspects matter to the individual. That's the template: personalize when you can, retrieve when you can't.
A strong cross-domain echo comes from candidate-conditional attention. The Deep Interest Network How can user vectors capture diverse interests without exploding in size? and AMP-CF Can attention mechanisms reveal which user taste explains each recommendation? both argue a user isn't one fixed vector but many interests or personas, and which ones "light up" depends on the candidate item being scored. Translate that to aspects: the dimension worth explaining for a running shoe (cushioning, fit) is not the one worth explaining for a thriller novel (pacing, ending). AMP-CF goes further by letting the activated persona double as the explanation — the aspect and the reason become the same object.
That item-category dependence isn't just intuition. Preference tuning research Does preference tuning always reduce diversity the same way? shows the *same* optimization pushes opposite directions across domains — convergence in code, divergence in creative writing — because each domain rewards different things. Aspect selection inherits this: categories where correctness dominates want different aspects than categories where taste and style dominate. And the prompt-tiering work Do prompt techniques work the same across all LLM tiers? makes the same structural point — task structure, not a universal best practice, decides what helps. There is no category-blind default.
For the *user* side, the corpus suggests aspect selection should lean on what reveals preference cheaply. Personalization works better from a user's outputs than their inputs Do user outputs outperform inputs for LLM personalization?, and abstracted preference summaries beat replaying raw past interactions Does abstract preference knowledge outperform specific interaction recall? — so adapted aspects should be inferred from distilled preference signal, not stitched from literal history. Most striking, reward factorization Can user preferences be learned from just ten questions? shows ten well-chosen adaptive questions can pin down a user's preference weights — implying aspect adaptation could be driven by a handful of maximally-informative probes rather than waiting for dense history to accumulate. The thread across all of it: adapt aspects *jointly* on item and user, and when the user is unknown, buy the signal with retrieval or a few sharp questions rather than retreating to a generic list.
Sources 8 notes
ERRA combines model-agnostic review retrieval with personalized aspect selection to address data sparsity that embedded methods cannot solve. Retrieval augmentation provides richer signal when user history is sparse, while aspect personalization ensures explanations match user context rather than generic defaults.
Deep Interest Network weights historical behaviors against each candidate ad, activating only relevant interests dynamically. This preserves dimension efficiency while expressing diverse tastes without lossy compression.
AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.
RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.
A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.
Research shows that user profiles built from outputs alone match or exceed performance of complete profiles across multiple tasks, while input-only profiles degrade performance. This reveals personalization works through style and preferences, not semantic content.
PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.