Related papers: Meta-R1: Empowering Large Reasoning Models with Metacognition

Meta-R1: Empowering Large Reasoning Models with Metacognition

URL: http://arxiv.org/abs/2508.17291v1
Date: Sun, 24 Aug 2025 10:36:36 GMT
Title: Meta-R1: Empowering Large Reasoning Models with Metacognition
Authors: Haonan Dong, Haoran Ye, Wenhao Zhu, Kehan Jiang, Guojie Song,
Abstract summary: Large Reasoning Models (LRMs) demonstrate remarkable capabilities on complex tasks, exhibiting emergent, human-like thinking patterns.<n>Current LRMs lack a dedicated meta-level cognitive system that enables "thinking about thinking"<n>We introduce Meta-R1, a systematic and generic framework that endows LRMs with explicit metacognitive capabilities.
Score: 26.882951068900496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Models (LRMs) demonstrate remarkable capabilities on complex tasks, exhibiting emergent, human-like thinking patterns. Despite their advances, we identify a fundamental limitation: current LRMs lack a dedicated meta-level cognitive system-an essential faculty in human cognition that enables "thinking about thinking". This absence leaves their emergent abilities uncontrollable (non-adaptive reasoning), unreliable (intermediate error), and inflexible (lack of a clear methodology). To address this gap, we introduce Meta-R1, a systematic and generic framework that endows LRMs with explicit metacognitive capabilities. Drawing on principles from cognitive science, Meta-R1 decomposes the reasoning process into distinct object-level and meta-level components, orchestrating proactive planning, online regulation, and adaptive early stopping within a cascaded framework. Experiments on three challenging benchmarks and against eight competitive baselines demonstrate that Meta-R1 is: (I) high-performing, surpassing state-of-the-art methods by up to 27.3%; (II) token-efficient, reducing token consumption to 15.7% ~ 32.7% and improving efficiency by up to 14.8% when compared to its vanilla counterparts; and (III) transferable, maintaining robust performance across datasets and model backbones.

Related papers

Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models [37.387637955634304]
Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks.<n>We propose Metacognitive Behavioral Tuning (MBT), a framework that explicitly injects metacognitive behaviors into the model's thought process.
arXiv Detail & Related papers (2026-02-26T00:56:15Z)
Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents [49.119608399413806]
Large language models (LLMs) are increasingly deployed as autonomous agents for multi-turn decision-making tasks.<n>This paper introduces Cog, a framework that trains agents to dynamically adapt cognitive depth at each step.<n> Experiments on ALFWorld and ScienceWorld demonstrate that Cog achieves state-of-the-art performance with superior efficiency.
arXiv Detail & Related papers (2026-02-13T06:52:09Z)
Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models [15.849480549367684]
We propose DAGVul, a novel framework that models vulnerability reasoning as a Directed Acyclic Graph (DAG) generation task.<n>By further introducing Reinforcement Learning with Verifiable Rewards (RLVR), we align model reasoning trace with program-intrinsic logic.<n>Our framework improves the reasoning F1-score by an average of 18.9% over all the baselines.
arXiv Detail & Related papers (2026-02-06T13:19:45Z)
Cognitive Foundations for Reasoning and Their Manifestation in LLMs [63.12951576410617]
Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning.<n>We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning reasoning invariants, meta-cognitive controls, representations for organizing reasoning & knowledge, and transformation operations.<n>We develop test-time reasoning guidance that automatically scaffold successful structures, improving performance by up to 66.7% on complex problems.
arXiv Detail & Related papers (2025-11-20T18:59:00Z)
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model [100.86587937568832]
Ring-1T is the first open-source, state-of-the-art thinking model with a trillion-scale parameter.<n>It features 1 trillion total parameters and activates approximately 50 billion per token.
arXiv Detail & Related papers (2025-10-21T17:46:14Z)
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents [31.726927520069616]
Self-Evolving Embodied Agents-R1, or SEEA-R1, is the first reinforcement fine-tuning framework designed for self-evolving embodied agents.<n>It converts sparse delayed rewards into denser intermediate signals that improve multi-step reasoning.<n>It generalizes reward estimation across tasks and scenes, supporting autonomous adaptation and reward-driven self-evolution.
arXiv Detail & Related papers (2025-06-26T18:00:07Z)
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback [59.078756231841574]
Critique-GRPO is an online RL framework that integrates both natural language and numerical feedback for effective policy optimization.<n>We show Critique-GRPO consistently outperforms supervised learning and RL-based fine-tuning methods across eight challenging mathematical, STEM, and general reasoning tasks.
arXiv Detail & Related papers (2025-06-03T17:39:02Z)
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities [101.77467538102924]
Recent advancements in Large Reasoning Models (LRMs) have demonstrated remarkable performance in specialized reasoning tasks.<n>We show that acquiring deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs.<n>We demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks.
arXiv Detail & Related papers (2025-03-23T08:18:51Z)
Reflection-Bench: Evaluating Epistemic Agency in Large Language Models [10.801745760525838]
Epistemic agency is the ability to flexibly construct, adapt, and monitor beliefs about dynamic environments.<n>We propose Reflection-Bench, a benchmark consisting of seven tasks with long-term relevance and minimization of data leakage.<n>Our findings suggest several promising research directions, including enhancing core cognitive functions, improving cross-functional coordination, and developing adaptive processing mechanisms.
arXiv Detail & Related papers (2024-10-21T17:59:50Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task. LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced. Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.