Related papers: SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Beyond Correctness: Learning Robust Reasoning via Transfer [51.403609251508904]
We adopt a simple philosophical view, robust reasoning should remain useful beyond the mind that produced it.<n>We introduce Reinforcement Learning with Transferable Reward, which operationalizes robustness via transfer reward.<n>Our approach improves sampling consistency while improving final answer accuracy, and it reaches comparable performance in substantially fewer training steps.
arXiv Detail & Related papers (2026-02-09T10:41:44Z)
Internalizing LLM Reasoning via Discovery and Replay of Latent Actions [4.830503861275364]
Internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute.<n>We propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem.
arXiv Detail & Related papers (2026-02-04T08:44:57Z)
Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering [50.63386303357225]
We propose AdaRAS, a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations.<n>AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference.<n> Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25.
arXiv Detail & Related papers (2026-01-27T17:53:01Z)
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs [9.412828452977553]
Existing approaches reinforce successful reasoning paths, incurring a substantial calibration cost.<n>This failure has been characterized as a form of model collapse in alignment.<n>We proposeEpiCaR as a training objective that jointly optimize reasoning performance and calibration.
arXiv Detail & Related papers (2026-01-11T06:21:13Z)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
arXiv Detail & Related papers (2026-01-07T23:05:17Z)
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time [22.9491443902816]
We study the structure of reasoning trajectories and uncover specialized attention heads that correlate with distinct cognitive behaviors.<n>We propose CREST, a training-free method for Cognitive REasoning Steering at Test-time.<n>CREST adaptively suppresses unproductive reasoning behaviors, yielding both higher accuracy and lower computational cost.
arXiv Detail & Related papers (2025-12-31T02:46:04Z)
When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents [2.689316553293938]
Supervised fine-tuning (SFT) has emerged as one of the most effective ways to improve the performance of large language models (LLMs) in downstream tasks.<n>We propose a pipeline in which LLMs generate reasoning steps that guide both the invocation of tools and the final answer generation for conversational agents.
arXiv Detail & Related papers (2025-12-12T04:44:40Z)
Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z)
Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression [30.653381666162275]
Certainty-Guided Reflection Suppression (CGRS) is a novel method that mitigates overthinking in Large Reasoning Language Models (LRLMs)<n>CGRS operates by dynamically suppressing the model's generation of reflection triggers when it exhibits high confidence in its current response.<n>Our approach is model-agnostic, requires no retraining or architectural modifications, and can be integrated seamlessly with existing autoregressive generation pipelines.
arXiv Detail & Related papers (2025-08-07T12:38:22Z)
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [53.149817480019834]
Recent advancements in large reasoning models (LRMs) have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT)<n>We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint during the token generation of the reasoning process.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well.
arXiv Detail & Related papers (2025-06-23T16:20:44Z)
Excessive Reasoning Attack on Reasoning LLMs [26.52688123765127]
In this work, we expose a novel threat: adversarial inputs can be crafted to exploit excessive reasoning behaviors.<n>Our results demonstrate a 3x to 9x increase in reasoning length with comparable utility performance.<n>Our crafted adversarial inputs exhibit transferability, inducing computational overhead in o3-mini, o1-mini, DeepSeek-R1, and QWQ models.
arXiv Detail & Related papers (2025-06-17T10:16:52Z)
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision [8.025866693669622]
Large language models (LLMs) have made remarkable strides in various natural language processing tasks, but their performance on complex reasoning problems remains hindered by a lack of explainability and trustworthiness.<n>We propose Knowledge Graph-constrained Trajectory Reasoning Attribution and Chain Explanation Supervision (KG-TRACES) to enhance the reasoning ability of LLMs.<n> KG-TRACES jointly supervises the model to: (1) predict symbolic relation paths, (2) predict full triple-level reasoning paths, and (3) generate attribution-aware reasoning processes grounded in the reasoning paths.
arXiv Detail & Related papers (2025-06-01T02:20:45Z)
Steering LLM Reasoning Through Bias-Only Adaptation [4.486093197820339]
Reinforcement-learning finetuning does not create new capabilities but strengthens reasoning patterns already latent in the pretrained network.<n>We test this claim by training steering vectors: layer-wise biases that additively amplify selected hidden features.<n>Experiments on four base models across the GSM8K and MATH benchmarks show that steering vectors recover, and in several cases exceed, the accuracy of fully-tuned counterparts.
arXiv Detail & Related papers (2025-05-24T13:55:38Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
Do Reasoning Models Show Better Verbalized Calibration? [19.776645881640178]
We investigate the calibration properties of LRMs trained via supervised fine-tuning distillation on long reasoning traces.<n>Our findings reveal that LRMs significantly outperform instruction-tuned models on complex reasoning tasks in both accuracy and confidence calibration.<n>Our results provide evidence for a potentially critical role of reasoning-oriented RL training in improving LLMs' capacity for generating trustworthy, self-aware outputs.
arXiv Detail & Related papers (2025-04-09T03:58:19Z)
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation [38.64751082999587]
Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy.<n>We propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations.<n>Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG)
arXiv Detail & Related papers (2025-03-27T17:44:18Z)
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training [62.536191233049614]
Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs)<n>This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld.<n>We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse.
arXiv Detail & Related papers (2025-03-11T15:17:02Z)
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model [70.77691645678804]
We present the first successful replication of emergent characteristics for multimodal reasoning on only a non-SFT 2B model.<n>Our model achieves 59.47% accuracy on CVBench, outperforming the base model by approximately 30% and exceeding both SFT setting by 2%.<n>In addition, we share our failed attempts and insights in attempting to achieve R1-like reasoning using RL with instruct models.
arXiv Detail & Related papers (2025-03-07T04:21:47Z)
Quantifying Logical Consistency in Transformers via Query-Key Alignment [20.636818928993684]
We propose a novel, lightweight evaluation strategy for logical reasoning.<n>By computing a single forward pass and extracting a "QK-score" from carefully chosen heads, our method reveals latent representations that reliably separate valid from invalid inferences.
arXiv Detail & Related papers (2025-02-24T10:02:50Z)
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps [3.8936716676293917]
This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data.<n>We identify a critical parameter threshold (1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning.
arXiv Detail & Related papers (2025-02-21T00:48:32Z)
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning [23.99454995087634]
We explore the potential of rule-based reinforcement learning in large reasoning models.<n>We use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification.<n>Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus.
arXiv Detail & Related papers (2025-02-20T17:49:26Z)
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning [65.2421542320293]
Reasoning abilities are crucial components of general intelligence.<n>Recent advances by proprietary companies, such as o-series models of OpenAI, have made remarkable progress on reasoning tasks.<n>This paper proposes a new RL framework, termed OREAL, to pursue the performance limit that can be achieved through textbfOutcome textbfREwtextbfArd-based reinforcement textbfLearning for mathematical reasoning tasks.
arXiv Detail & Related papers (2025-02-10T18:57:29Z)
Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task. LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced. Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)
Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM) We propose a decoding algorithm integrating the self-evaluation guidance via beam search. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.