Related papers: Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

URL: http://arxiv.org/abs/2512.15662v1
Date: Wed, 17 Dec 2025 18:15:17 GMT
Title: Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning
Authors: Jiaqi Xu, Cuiling Lan, Xuejin Chen, Yan LU,
Abstract summary: We propose Stepwise Think-Critique, a unified framework that interleaves reasoning and self-critique at each step within a single model.<n> STC is trained with a hybrid reinforcement learning objective combining reasoning rewards and critique-consistency rewards to jointly optimize reasoning quality and self-evaluation.
Score: 47.867294403474176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human beings solve complex problems through critical thinking, where reasoning and evaluation are intertwined to converge toward correct solutions. However, most existing large language models (LLMs) decouple reasoning from verification: they either generate reasoning without explicit self-checking or rely on external verifiers to detect errors post hoc. The former lacks immediate feedback, while the latter increases system complexity and hinders synchronized learning. Motivated by human critical thinking, we propose Stepwise Think-Critique (STC), a unified framework that interleaves reasoning and self-critique at each step within a single model. STC is trained with a hybrid reinforcement learning objective combining reasoning rewards and critique-consistency rewards to jointly optimize reasoning quality and self-evaluation. Experiments on mathematical reasoning benchmarks show that STC demonstrates strong critic-thinking capabilities and produces more interpretable reasoning traces, representing a step toward LLMs with built-in critical thinking.

Related papers

ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation [4.265094703231012]
We introduce textbfALIVE (emphAdrial Learning with Instructive Verbal Evaluation), a hands-free alignment framework.<n>By coupling adversarial learning with instructive verbal feedback, ALIVE enables models to internalize evaluative criteria directly from raw corpora.<n>With identical data and compute, ALIVE achieves markedly improved cross-domain generalization, and higher self-correction rates.
arXiv Detail & Related papers (2026-02-05T09:20:23Z)
CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction [50.67483317563736]
This paper aims to explore a system that can think step-by-step, look up information if needed, generate results, self-evaluate its own results, and refine the results.<n>We introduce CoT-Seg, a training-free framework that rethinks reasoning segmentation by combining chain-of-thought reasoning with self-correction.
arXiv Detail & Related papers (2026-01-24T11:41:54Z)
Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models [72.4149653187766]
We propose a Reasoner-Verifier framework named Adrialversa Reasoning RAG (ARR)<n>The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage.<n> Experiments on multiple benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2026-01-08T06:57:03Z)
STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models [12.745473719032026]
We present STaR (slow-thinking for table reasoning), a new framework achieving cognitive table reasoning.<n> STaR explicitly modeling step-by-step thinking and uncertainty-aware inference.<n>Experiments on benchmarks demonstrate that STaR achieves superior performance and enhanced reasoning stability.
arXiv Detail & Related papers (2025-11-14T12:34:17Z)
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models [28.756240721942138]
Reasoning large language models (RLLMs) have recently demonstrated remarkable capabilities through structured and multi-step reasoning.<n>We propose Thinking with Nothinking (JointThinking), a new ICL paradigm that prompts the model to generate two answers in parallel.<n>JointThinking significantly outperforms few-shot chain-of-thought (CoT), thinking twice and majority voting.
arXiv Detail & Related papers (2025-08-05T12:09:55Z)
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic [48.94340387130627]
Critic-CoT is a framework that pushes LLMs toward System-2-like critic capability.<n>CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation.<n>Experiments on GSM8K and MATH demonstrate that our enhanced model significantly boosts task-solving performance.
arXiv Detail & Related papers (2024-08-29T08:02:09Z)
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner [30.203952806009717]
Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. We introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the need for human feedback, external tools, and handcraft prompts.
arXiv Detail & Related papers (2024-03-28T02:12:49Z)
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z)
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [55.66353783572259]
Causal-Consistency Chain-of-Thought harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models.<n>Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations.
arXiv Detail & Related papers (2023-08-23T04:59:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.