How does sequential modeling within a session differ from modeling historical purchase sequences?

This explores the difference between modeling the order of actions inside a single session (short-term, in-the-moment intent) and modeling a user's long history of purchases (long-term, persistent preference), and what the corpus says each one actually captures.

This question is really about two different time horizons for the same user, and the corpus suggests they aren't just shorter and longer versions of the same thing — they reward different modeling choices. Within-session sequential modeling is about the *order* of recent actions: what you clicked just now shapes what you want next. But that order is fragile. Language models, used as recommenders, turn out to ignore temporal order by default — they'll happily read your interaction history as an unordered bag of items unless you explicitly prompt them to weight recent actions, at which point latent order-sensitivity reappears without any retraining Why do language models ignore temporal order in ranking?. So 'within-session sequence' is less a free property of the model and more something you have to deliberately surface.

Historical purchase sequences pull in the opposite direction. Here the interesting finding is that storing every past interaction (episodic memory) is *worse* than compressing history into abstract preference summaries (semantic memory). Recency-based recall beats similarity-based retrieval, and a learned summary of 'what this person tends to like' outperforms replaying the literal log of what they bought Does abstract preference knowledge outperform specific interaction recall?. The lesson cuts against intuition: long histories are most useful when you throw most of the sequence away and keep the distilled preference.

The sharpest reframing comes from work showing that neither raw sessions nor raw purchase logs capture what users are actually doing. Two-thirds of users are pursuing 'interest journeys' that last more than a month — specific, nameable pursuits like 'designing hydroponic systems for small spaces' — that classic collaborative filtering completely misses because it operates on item co-occurrence, not user-level meaning Can language models discover what users actually want from activity logs?. This sits *between* the session and the lifetime history: longer than a session, more coherent than a scatter of purchases. It suggests the real distinction isn't session-vs-history but short intent vs. persistent goal — and that the most valuable signal lives at a granularity neither traditional approach was built to see.

There's also an architectural angle the corpus surfaces obliquely. The session/history split is partly a stability-vs-plasticity problem: you want to absorb new behavior fast (this session) without forgetting old patterns (the history). Streaming-recommendation work handles exactly this tension by isolating new parameters for emerging preferences while preserving older ones exactly, rather than letting fresh data overwrite the past Can model isolation solve streaming recommendation better than replay?. Read against the personalization findings, this hints that 'session vs. history' is best treated as two compartments with different update rules — fast and overwriteable for the session, slow and protected for the long-term preference.

The thing you might not have expected to learn: the field is quietly converging on the idea that order matters most at the short horizon and matters *least* at the long one, where abstraction wins over sequence. The longer the history, the more you should be modeling a person's stable goals rather than the literal chain of what they did.

Sources 4 notes

Why do language models ignore temporal order in ranking?

LLMs can extract preferences from interaction histories but disregard temporal order by default. Recency-focused prompts and in-context examples activate latent order-sensitivity, improving ranking without retraining.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can language models discover what users actually want from activity logs?

66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.

Can model isolation solve streaming recommendation better than replay?

DEGC uses per-task parameter isolation to handle streaming recommendation, providing explicit stability-plasticity trade-offs that experience replay and knowledge distillation methods cannot match. This approach preserves older patterns exactly while allowing new parameters to capture emerging preferences.

How does sequential modeling within a session differ from modeling historical purchase sequences?

Sources 4 notes

Next inquiring lines