Can routing mask future experts to prevent knowledge leakage?
Can models be built so that they respect query timestamps by selectively silencing experts trained on future data? This explores whether temporal causality can be enforced through architecture rather than external retrieval.
LLMs trained on a fixed web snapshot go stale and, worse, risk temporal leakage — answering as if they know information that postdates a query. Standard pretraining merges all time periods indiscriminately, so the model has no principled way to respect a query's timestamp. TiMoE makes temporal grounding architectural: pre-train a set of GPT-style experts on disjoint two-year slices of a 2013–2024 corpus, then at inference mask every expert whose training window ends after the query timestamp and merge the remaining experts' log-probabilities in a shared space. This guarantees strict causal validity while retaining multi-period breadth.
The result quantifies the trade: on the new 10k-question TSQA benchmark (alternatives labelled past/future/irrelevant), TiMoE cuts future-knowledge errors by up to ~15% and delivers steadier accuracy across years, at a "manageable cost of time-awareness" — a slight underperformance on eight standard NLP tasks rather than a fundamental barrier. The keeper is the design principle: temporal causality can be enforced by routing over time-partitioned parameters, not only by external retrieval or post-hoc verification.
This sits alongside retrieval-time and prompt-time temporal fixes as the parametric option. It complements Does AI text generation unfold through temporal reflection? (the RAG route to temporal grounding) by pushing the same concern into the model's own expert structure, trading some general accuracy for guaranteed causal validity.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can differential privacy during generation eliminate leakage at scale?
- What temporal and spatial constraints does Space-Time U-Net solve?
- What privacy-preserving evaluation methods best capture real-world forecasting ability?
- Can time-awareness live in model parameters instead of retrieval?
- How does time-partitioned routing compare to retrieval-augmented temporal grounding?
- What is the accuracy cost of enforcing temporal causality inside model parameters?
- Can modular expert decomposition extend beyond time into other causal dimensions?
- Why does masking future experts guarantee causal validity without external verification?
- Why does Branch-Train-Merge fail without learned routing between experts?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does AI text generation unfold through temporal reflection?
Explores whether the sequential ordering of tokens in LLM generation constitutes genuine temporal thought or merely probabilistic computation without reflective duration.
the retrieval-time route to temporal grounding; TiMoE is the parametric/architectural route
-
Can brain structure guide how we design intelligent agents?
Does mapping agent capabilities onto human brain functions provide a useful organizing framework for understanding and comparing different agent architectures? This matters because agents need a shared vocabulary to advance beyond one-off designs.
both modularize capability; TiMoE modularizes by time slice with causal routing
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- TiMoE: Time-Aware Mixture of Language Experts
- Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
- LLM Reasoning Is Latent, Not the Chain of Thought
- Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
- What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
- Causal Reflection with Language Models
- Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
Original note title
temporal grounding can be architectural — time-sliced experts with causal routing that masks future experts eliminate future-knowledge leakage