Related papers: Eliciting Reasoning in Language Models with Cognitive Tools

Eliciting Reasoning in Language Models with Cognitive Tools

URL: http://arxiv.org/abs/2506.12115v1
Date: Fri, 13 Jun 2025 13:56:52 GMT
Title: Eliciting Reasoning in Language Models with Cognitive Tools
Authors: Brown Ebouky, Andrea Bartezzaghi, Mattia Rigotti,
Abstract summary: We build on the long-standing literature in cognitive psychology and cognitive architectures.<n>We endow an LLM with a small set of "cognitive tools" encapsulating specific reasoning operations.<n>Surprisingly, this simple strategy results in considerable gains in performance on standard mathematical reasoning benchmarks.
Score: 9.68459632251626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent advent of reasoning models like OpenAI's o1 was met with excited speculation by the AI community about the mechanisms underlying these capabilities in closed models, followed by a rush of replication efforts, particularly from the open source community. These speculations were largely settled by the demonstration from DeepSeek-R1 that chains-of-thought and reinforcement learning (RL) can effectively replicate reasoning on top of base LLMs. However, it remains valuable to explore alternative methods for theoretically eliciting reasoning that could help elucidate the underlying mechanisms, as well as providing additional methods that may offer complementary benefits. Here, we build on the long-standing literature in cognitive psychology and cognitive architectures, which postulates that reasoning arises from the orchestrated, sequential execution of a set of modular, predetermined cognitive operations. Crucially, we implement this key idea within a modern agentic tool-calling framework. In particular, we endow an LLM with a small set of "cognitive tools" encapsulating specific reasoning operations, each executed by the LLM itself. Surprisingly, this simple strategy results in considerable gains in performance on standard mathematical reasoning benchmarks compared to base LLMs, for both closed and open-weight models. For instance, providing our "cognitive tools" to GPT-4.1 increases its pass@1 performance on AIME2024 from 26.7% to 43.3%, bringing it very close to the performance of o1-preview. In addition to its practical implications, this demonstration contributes to the debate regarding the role of post-training methods in eliciting reasoning in LLMs versus the role of inherent capabilities acquired during pre-training, and whether post-training merely uncovers these latent abilities.

Related papers

Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks [61.06621533874629]
In-context learning (ICL) has demonstrated remarkable success in large language models (LLMs)<n>In this paper, we propose, for the first time, the dual-learning hypothesis, which posits that LLMs simultaneously learn both the task-relevant latent concepts and backdoor latent concepts.<n>Motivated by these findings, we propose ICLShield, a defense mechanism that dynamically adjusts the concept preference ratio.
arXiv Detail & Related papers (2025-07-02T03:09:20Z)
Who Reasons in the Large Language Models? [18.521142439429635]
We show that reasoning capabilities in well-trained large language models are primarily attributed to the output projection module (oproj) in the Transformer's multi-head self-attention mechanism.<n>We provide both circumstantial and empirical evidence suggesting that oproj plays a central role in enabling reasoning, whereas other modules contribute more to fluent dialogue.
arXiv Detail & Related papers (2025-05-27T10:26:47Z)
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs [14.78605805191225]
Reinforcement learning-based post-training of large language models (LLMs) has recently gained attention.<n>We critically examine the formulation and assumptions underlying these methods.
arXiv Detail & Related papers (2025-05-19T19:57:15Z)
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders [8.1201445044499]
Large Language Models (LLMs) have achieved remarkable success in natural language processing.<n>Recent advances have led to the developing of a new class of reasoning LLMs.<n>Open-source DeepSeek-R1 has achieved state-of-the-art performance by integrating deep thinking and complex reasoning.
arXiv Detail & Related papers (2025-03-24T16:54:26Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models [42.70951894754312]
Integration of slow-thinking mechanisms into large language models offers a promising way toward Level 2 AGI Reasoners.<n>We propose a self-backtracking mechanism that equips LLMs with the ability to backtrack during both training and inference.<n>This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement.
arXiv Detail & Related papers (2025-02-06T08:52:43Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
Can formal argumentative reasoning enhance LLMs performances? [0.3659498819753633]
We present a pipeline (MQArgEng) to evaluate the effect of introducing computational argumentation semantics on the performance of Large Language Models (LLMs) Exploratory results indicate that MQArgEng provides a moderate performance gain in most of the examined topical categories and, as such, show promise and warrant further research.
arXiv Detail & Related papers (2024-05-16T22:09:31Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [77.72128397088409]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.<n>We also propose a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs) Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.