Why do large language models explore less effectively than humans?
This research investigates why LLMs make decisions too quickly during open-ended exploration tasks. It examines whether the problem lies in training data, prompt engineering, or something deeper in how transformer architectures process information over time.
"Large Language Models Think Too Fast To Explore Effectively" uses Little Alchemy 2 as an open-ended exploration benchmark. Most LLMs underperform humans — they rely heavily on uncertainty-driven strategies (reducing ambiguity, exploiting known information) while humans balance uncertainty with empowerment (maximizing future possibilities, intrinsic discovery).
The mechanistic explanation comes from Sparse Auto-Encoder (SAE) decomposition. Uncertainty values dominate early transformer blocks. Choices correlated with immediate outcomes are also represented early. Empowerment values — which represent the potential for future discovery — emerge only in middle blocks. This temporal mismatch means the model has already committed to a decision based on uncertainty before the empowerment signal is available to inform it.
The result is "thinking too fast": premature decisions that prioritize short-term utility over deeper exploration. This is not a training data issue — neither prompt engineering nor activation intervention improved traditional LLM performance. The architecture processes short-term signals before long-term signals, and decisions are made on whichever signal arrives first.
The o1 exception is revealing. OpenAI's o1 surpasses human performance on this task. This suggests that reasoning training — specifically the extended chain-of-thought processing — creates enough computational delay for empowerment signals to influence decisions. The model isn't given new exploration capability; it is given more processing time for the empowerment representations to participate in the decision.
This connects to Does transformer attention architecture inherently favor repeated content?. Both findings locate behavioral failures in architectural processing order rather than training data. Sycophancy is partly an attention-weighting problem; premature exploration decisions are partly a block-ordering problem. Both suggest that some behavioral deficits require architectural solutions, not just better training.
The connection to Do base models already contain hidden reasoning ability? adds nuance: empowerment representations exist in the model (middle blocks). They are not absent — they are outpaced. Reasoning training doesn't add exploration capability; it gives existing capability time to participate.
Inquiring lines that use this note as a source 11
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Could probing methods miss computationally important features in neural networks?
- Why do language models tend to elaborate and expand rather than compress information?
- Why do LLMs plateau on creativity tasks while humans reach further?
- Can external summarization solve exploration problems in complex real-world environments?
- Do LLMs fail exploration because of context integration or computational limitations?
- Why does exploration quality matter more than learner network depth?
- Why should bandit algorithms condition exploration on time-of-period as well as user state?
- Why do longer reasoning chains explore like tourists instead of scientists?
- Is premature decision-making a form of underthinking in transformer models?
- Why does the pretrained prior determine the exploration ceiling?
- Why do LLMs degrade on long inputs before hitting context limits?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does transformer attention architecture inherently favor repeated content?
Explores whether soft attention's tendency to over-weight repeated and prominent tokens explains sycophancy independent of training. Questions whether architectural bias precedes and enables RLHF effects.
both locate behavioral failures in architecture not training
-
Do base models already contain hidden reasoning ability?
Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
empowerment capability exists but is outpaced; reasoning training gives it time
-
Does RL teach reasoning or just when to use it?
Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
o1's exploration superiority may be another instance of RL teaching timing not capability
-
Do reasoning models switch between ideas too frequently?
Research explores whether o1-like models abandon promising reasoning paths prematurely by switching to different approaches without sufficient depth, and whether penalizing such transitions could improve accuracy.
behavioral manifestation of the same architectural problem: "thinking too fast" at the block level (uncertainty dominates before empowerment arrives) produces premature thought switching at the decoding level (model abandons promising paths before depth is sufficient); TIP's success suggests decoding-time intervention can partially compensate for the architecture's processing-order bias
-
Why do reasoning LLMs fail at deeper problem solving?
Explores whether current reasoning models systematically search solution spaces or merely wander through them, and how this affects their ability to solve increasingly complex problems.
connects: wandering is the exploration-level consequence of premature decisions; if the model commits to directions before empowerment signals can evaluate long-term potential, it will explore unsystematically — the o1 exception supports this, as it both explores more systematically (contradicting the wandering thesis) and processes empowerment signals (this note)
-
Why do LLMs struggle with exploration in simple decision tasks?
This explores why large language models fail at exploration—a core decision-making capability—even when they excel at other tasks, and what specific conditions might help them succeed.
behavioral evidence for the same exploration deficit: even with explicit hints, LLMs fail to explore in bandit environments without external history summarization; the empowerment-timing mechanism explains why — the model commits to exploitation before the exploration signal is processed, and external summarization bypasses this by converting the exploration problem into a structured decision that doesn't require empowerment-level processing
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Large Language Models Think Too Fast To Explore Effectively
- Teaching Large Language Models to Reason with Reinforcement Learning
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
- Can large language models explore in-context?
- From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR
- A Mechanistic Analysis of Looped Reasoning Language Models
Original note title
traditional llms lack empowerment-driven exploration because uncertainty values dominate early transformer blocks causing premature decisions