Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

Paper · arXiv 2606.03979 · Published June 2, 2026
LLM Memory

The past few decades have witnessed significant advances in the design of machine learning algorithms–from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a “Sleep” paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with “Dreaming” process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for Knowledge Seeding (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

Introduction. The development of Large Language Models (LLMs) marks a pivotal milestone in machine learning research: a paradigm shift from task-specific models to more general-purpose systems with various emergent capabilities (Brown et al. 2020; Schaeffer et al. 2023). Despite LLMs' remarkable capabilities in diverse sets of tasks (Nijkamp et al. 2023; Wang et al. 2023; Comanici et al. 2025), they are largely static after their initial deployment, meaning that they successfully perform tasks learned during pre- or post-training, but are unable to continually acquire new capabilities beyond their immediate context. This inherent static nature creates a crucial vulnerability: The model's knowledge and skills become progressively stale, operating with a fixed "knowledge cutoff" date beyond which it is unaware of new facts, events, and evolving information (Cheng et al. 2024).

Discussion / Conclusion. In this work, we introduced the Sleep paradigm for Large Language Models, consists of: (i) knowledge seeding, an upward distillation that transfers short-term, in-context knowledge into lower-frequency, long-term parameters, and (ii) dreaming, self-generated training that improves capabilities while controlling interference. In our experimental results, across long-context understanding, knowledge incorporation, few-shot reasoning, and continual learning, Sleep yields consistent gains.