Related papers: Constraint-Rectified Training for Efficient Chain-of-Thought

Constraint-Rectified Training for Efficient Chain-of-Thought

URL: http://arxiv.org/abs/2602.12526v1
Date: Fri, 13 Feb 2026 02:13:45 GMT
Title: Constraint-Rectified Training for Efficient Chain-of-Thought
Authors: Qinhang Wu, Sen Lin, Ming Zhang, Yingbin Liang, Ness B. Shroff,
Abstract summary: Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs)<n>While longer reasoning traces can improve answer quality and unlock abilities such as self-correction, they also incur high inference costs and often introduce redundant steps, known as overthinking.<n>Recent research seeks to develop efficient reasoning strategies that balance reasoning length and accuracy.
Score: 60.52883907721588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), especially when combined with reinforcement learning (RL) based post-training methods. While longer reasoning traces can improve answer quality and unlock abilities such as self-correction, they also incur high inference costs and often introduce redundant steps, known as overthinking. Recent research seeks to develop efficient reasoning strategies that balance reasoning length and accuracy, either through length-aware reward design or prompt-based calibration. However, these heuristic-based approaches may suffer from severe accuracy drop and be very sensitive to hyperparameters. To address these problems, we introduce CRT (Constraint-Rectified Training), a principled post-training framework based on reference-guarded constrained optimization, yielding a more stable and interpretable formulation for efficient reasoning. CRT alternates between minimizing reasoning length and rectifying accuracy only when performance falls below the reference, enabling stable and effective pruning of redundant reasoning. We further extend CRT with a two-stage training scheme that first discovers the shortest reliable reasoning patterns and then refines accuracy under a learnt length budget, preventing the re-emergence of verbose CoT. Our comprehensive evaluation shows that this framework consistently reduces token usage while maintaining answer quality at a robust and reliable level. Further analysis reveals that CRT improves reasoning efficiency not only by shortening responses but also by reducing internal language redundancy, leading to a new evaluation metric. Moreover, CRT-based training naturally yields a sequence of intermediate checkpoints that span a spectrum of explanation lengths while preserving correctness, enabling fine-grained control over reasoning verbosity without retraining.

Related papers

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning [66.22060690012512]
Large reasoning models improve with more test-time computation, but often overthink, producing unnecessarily long chains-of-thought that raise cost without improving accuracy.<n>We propose Step-wise Adaptive Penalization (SWAP), a fine-grained framework that allocates length reduction across steps based on intrinsic contribution.
arXiv Detail & Related papers (2026-02-27T20:23:59Z)
Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning [58.331709210563616]
Thinking by Subtraction is a confidence-driven contrastive decoding approach.<n>A small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion.<n>Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes at these positions.
arXiv Detail & Related papers (2026-02-20T14:13:22Z)
Structured Reasoning for Large Language Models [59.215789462977206]
We propose Structured Reasoning (SCR), a framework that decouples reasoning trajectories into explicit, evaluable, and trainable components.<n>SCR substantially improves reasoning efficiency and self-verification.<n>Compared with existing reasoning paradigms, it reduces output token length by up to 50%.
arXiv Detail & Related papers (2026-01-12T04:04:01Z)
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time [22.9491443902816]
We study the structure of reasoning trajectories and uncover specialized attention heads that correlate with distinct cognitive behaviors.<n>We propose CREST, a training-free method for Cognitive REasoning Steering at Test-time.<n>CREST adaptively suppresses unproductive reasoning behaviors, yielding both higher accuracy and lower computational cost.
arXiv Detail & Related papers (2025-12-31T02:46:04Z)
Large Reasoning Models Learn Better Alignment from Flawed Thinking [56.08883934423522]
Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer.<n>We propose RECAP, a principled reinforcement learning (RL) method for post-training that explicitly teaches models to override flawed reasoning trajectories.
arXiv Detail & Related papers (2025-10-01T14:15:43Z)
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning [80.104336426172]
Chain-of-thought (CoT) enhances problem-solving ability of large language models.<n>CoT incurs substantial inference cost due to long autoregressive trajectories.<n>We introduce R-Stitch, a training-free hybrid decoding framework.
arXiv Detail & Related papers (2025-07-23T08:14:36Z)
SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control [5.224609066309358]
Large reasoning models (LRMs) have exhibited remarkable reasoning capabilities through inference-time scaling.<n>Previous work has attempted to mitigate this issue by penalizing the overall length of generated samples during reinforcement learning.<n>We propose SmartThinker, a two-stage learnable framework designed to enable fine-grained control over the length of reasoning chains.
arXiv Detail & Related papers (2025-07-06T11:21:47Z)
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control [18.273777938294327]
Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts.<n>We introduce AALC, a lightweight, accuracy-aware length reward integrated into reinforcement learning.<n>We show that our approach reduces response length by over 50% while maintaining or even improving the original accuracy.
arXiv Detail & Related papers (2025-06-25T06:29:18Z)
SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought [37.53215651690168]
Chain of Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step by step thinking.<n>While promising, CoT-based approaches often require costly pretraining and lack a principled framework for how reasoning should evolve.<n>We propose SCOUT, a lightweight fine tuning framework that enables Flow CoT style reasoning without the need for pretraining.
arXiv Detail & Related papers (2025-05-30T03:43:24Z)
Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning [15.137717200618454]
Prune-on-Logic is a framework that transforms Long-CoT into logic graphs and selectively prunes low-utility reasoning steps.<n>We find that verification pruning consistently improves accuracy while reducing token usage, whereas reasoning or indiscriminate pruning degrades performance.
arXiv Detail & Related papers (2025-05-20T16:38:32Z)
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [64.93140713419561]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs.<n>Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection.<n>We introduce ConCISE, a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient.
arXiv Detail & Related papers (2025-05-08T01:40:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.