SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Agentic Systems and Tool Use

Can modular cognitive tools unlock reasoning without training?

Can reasoning capabilities be elicited by structuring LLM calls as isolated cognitive operations—understanding, recalling, examining, and backtracking—rather than through reinforcement learning?

Synthesis note · 2026-02-22 · sourced from Reasoning Architectures

Cognitive architectures in psychology posit that reasoning arises from the orchestrated, sequential execution of modular, predetermined cognitive operations. The Cognitive Tools paper instantiates this in a modern tool-calling framework: four cognitive tools are implemented as discrete functions, each executed by the same LLM in a sandboxed context.

The four cognitive tools:

  1. Understand question: Breaks down the problem by identifying main concepts, extracting relevant information, highlighting properties/theorems/techniques that might help
  2. Recall related: Retrieves related knowledge of similar questions the model knows how to answer — guides reasoning through analogous examples
  3. Examine answer: Self-evaluation of a generated answer
  4. Backtracking: Returns to a prior reasoning state when a path appears unproductive

Unlike standard agentic tools (external APIs, calculators), cognitive tools encapsulate reasoning operations within the LLM itself. Each tool's schema includes a prompt template that isolates a specific cognitive operation; the LLM executes it in sandboxed context and feeds the structured result back into the main reasoning loop.

Results: GPT-4.1 on AIME2024 improves from 26.7% to 43.3% pass@1 — approaching o1-preview performance without any RL training. Similar gains across closed and open-weight models.

The key insight: modularity reduces interference between operations. Cognitive prompting (monolithic structured prompts) improves reasoning but lacks the isolation that makes modular cognitive architectures powerful. A tool-calling implementation enforces the sandboxed execution that pure prompting cannot guarantee.

This provides direct evidence for Do base models already contain hidden reasoning ability? — cognitive tools elicit pre-existing latent capability through structured invocation, not through training. The tool-calling framework is the elicitation mechanism.

The connection to Can structured argument prompts make LLM reasoning more rigorous?: both use structured decomposition of reasoning requirements to improve performance. Cognitive tools generalize this from argumentation-specific structure to domain-general cognitive operations.

Self-Discover as predecessor: Self-Discover (Zhou et al., 2024) is the clearest precursor to cognitive tools. It implements a two-stage process: (1) SELECT relevant atomic reasoning modules from a predefined set (critical thinking, step-by-step thinking, decomposition, etc.), (2) ADAPT selected modules to the specific task, (3) IMPLEMENT as a structured reasoning plan. The key difference from cognitive tools: Self-Discover composes a task-specific plan at inference time with only 3 extra inference steps — cheaper than the tool-calling loop but less modular. Self-Discover is more efficient (no sandboxed execution overhead) while cognitive tools provide stronger isolation between operations.

Inquiring lines that use this note as a source 122

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
19 direct connections · 200 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

cognitive tools implement reasoning operations as modular agentic tool calls that elicit reasoning without rl training