Can modular cognitive tools unlock reasoning without training?

Can reasoning capabilities be elicited by structuring LLM calls as isolated cognitive operations—understanding, recalling, examining, and backtracking—rather than through reinforcement learning?

Synthesis note · 2026-02-22 · sourced from Reasoning Architectures

Cognitive architectures in psychology posit that reasoning arises from the orchestrated, sequential execution of modular, predetermined cognitive operations. The Cognitive Tools paper instantiates this in a modern tool-calling framework: four cognitive tools are implemented as discrete functions, each executed by the same LLM in a sandboxed context.

The four cognitive tools:

Understand question: Breaks down the problem by identifying main concepts, extracting relevant information, highlighting properties/theorems/techniques that might help
Recall related: Retrieves related knowledge of similar questions the model knows how to answer — guides reasoning through analogous examples
Examine answer: Self-evaluation of a generated answer
Backtracking: Returns to a prior reasoning state when a path appears unproductive

Unlike standard agentic tools (external APIs, calculators), cognitive tools encapsulate reasoning operations within the LLM itself. Each tool's schema includes a prompt template that isolates a specific cognitive operation; the LLM executes it in sandboxed context and feeds the structured result back into the main reasoning loop.

Results: GPT-4.1 on AIME2024 improves from 26.7% to 43.3% pass@1 — approaching o1-preview performance without any RL training. Similar gains across closed and open-weight models.

The key insight: modularity reduces interference between operations. Cognitive prompting (monolithic structured prompts) improves reasoning but lacks the isolation that makes modular cognitive architectures powerful. A tool-calling implementation enforces the sandboxed execution that pure prompting cannot guarantee.

This provides direct evidence for Do base models already contain hidden reasoning ability? — cognitive tools elicit pre-existing latent capability through structured invocation, not through training. The tool-calling framework is the elicitation mechanism.

The connection to Can structured argument prompts make LLM reasoning more rigorous?: both use structured decomposition of reasoning requirements to improve performance. Cognitive tools generalize this from argumentation-specific structure to domain-general cognitive operations.

Self-Discover as predecessor: Self-Discover (Zhou et al., 2024) is the clearest precursor to cognitive tools. It implements a two-stage process: (1) SELECT relevant atomic reasoning modules from a predefined set (critical thinking, step-by-step thinking, decomposition, etc.), (2) ADAPT selected modules to the specific task, (3) IMPLEMENT as a structured reasoning plan. The key difference from cognitive tools: Self-Discover composes a task-specific plan at inference time with only 3 extra inference steps — cheaper than the tool-calling loop but less modular. Self-Discover is more efficient (no sandboxed execution overhead) while cognitive tools provide stronger isolation between operations.

Inquiring lines that use this note as a source 122

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

19 direct connections · 200 in 2-hop network ·dense cluster Open in graph ↗

Can modular cognitive tools unlock reasoning wit… Do base models already contain hidden reasoning ab… Can structured argument prompts make LLM reasoning… Does RL teach reasoning or just when to use it? Can reasoning and tool execution be truly decouple… Can we automatically optimize both prompts and age…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do base models already contain hidden reasoning ability? Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
cognitive tools elicit pre-existing capability without training
Can structured argument prompts make LLM reasoning more rigorous? Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
same principle: structured reasoning decomposition improves performance
Does RL teach reasoning or just when to use it? Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
cognitive tools is an alternative to RL as the elicitation mechanism
Can reasoning and tool execution be truly decoupled? Can LLM reasoning be separated from tool observations to eliminate redundant re-prompting and enable parallel execution? Two recent architectures suggest yes, but what are the tradeoffs?
both use tool-calling architecture for reasoning; cognitive tools targets internal operations, CoA/ReWOO target external calls
Can we automatically optimize both prompts and agent coordination? This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
cognitive tools are node-level operations within the computational graph framework: understand, recall, examine, and backtrack are function nodes whose composition forms an agent-level reasoning graph; the graph framework suggests these cognitive operations could be automatically optimized and recombined

Can modular cognitive tools unlock reasoning without training?

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4