Can routing mask future experts to prevent knowledge leakage?

Can models be built so that they respect query timestamps by selectively silencing experts trained on future data? This explores whether temporal causality can be enforced through architecture rather than external retrieval.

Synthesis note · 2026-06-03 · sourced from Test Time Compute

LLMs trained on a fixed web snapshot go stale and, worse, risk temporal leakage — answering as if they know information that postdates a query. Standard pretraining merges all time periods indiscriminately, so the model has no principled way to respect a query's timestamp. TiMoE makes temporal grounding architectural: pre-train a set of GPT-style experts on disjoint two-year slices of a 2013–2024 corpus, then at inference mask every expert whose training window ends after the query timestamp and merge the remaining experts' log-probabilities in a shared space. This guarantees strict causal validity while retaining multi-period breadth.

The result quantifies the trade: on the new 10k-question TSQA benchmark (alternatives labelled past/future/irrelevant), TiMoE cuts future-knowledge errors by up to ~15% and delivers steadier accuracy across years, at a "manageable cost of time-awareness" — a slight underperformance on eight standard NLP tasks rather than a fundamental barrier. The keeper is the design principle: temporal causality can be enforced by routing over time-partitioned parameters, not only by external retrieval or post-hoc verification.

This sits alongside retrieval-time and prompt-time temporal fixes as the parametric option. It complements Does AI text generation unfold through temporal reflection? (the RAG route to temporal grounding) by pushing the same concern into the model's own expert structure, trading some general accuracy for guaranteed causal validity.

Inquiring lines that use this note as a source 9

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 177 in 2-hop network ·dense cluster Open in graph ↗

Can routing mask future experts to prevent knowl… Does AI text generation unfold through temporal re… Can brain structure guide how we design intelligen…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does AI text generation unfold through temporal reflection? Explores whether the sequential ordering of tokens in LLM generation constitutes genuine temporal thought or merely probabilistic computation without reflective duration.
the retrieval-time route to temporal grounding; TiMoE is the parametric/architectural route
Can brain structure guide how we design intelligent agents? Does mapping agent capabilities onto human brain functions provide a useful organizing framework for understanding and comparing different agent architectures? This matters because agents need a shared vocabulary to advance beyond one-off designs.
both modularize capability; TiMoE modularizes by time slice with causal routing

Can routing mask future experts to prevent knowledge leakage?

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4