Related papers: Efficient RL Training for Reasoning Models via Length-Aware Optimization

Related papers

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning [66.22060690012512]
Large reasoning models improve with more test-time computation, but often overthink, producing unnecessarily long chains-of-thought that raise cost without improving accuracy.<n>We propose Step-wise Adaptive Penalization (SWAP), a fine-grained framework that allocates length reduction across steps based on intrinsic contribution.
arXiv Detail & Related papers (2026-02-27T20:23:59Z)
Towards Flash Thinking via Decoupled Advantage Policy Optimization [11.025775055262569]
Large Reasoning Models (LRMs) have achieved remarkable performance in solving complex problems via supervised fine-tuning (SFT) and reinforcement learning (RL)<n>Existing RL algorithms suffer from excessively lengthy responses and overthinking issues, resulting in increased inference latency and computational consumption.<n>We propose a novel RL framework, DEPO, to reduce inefficient reasoning for models.
arXiv Detail & Related papers (2025-10-17T07:19:20Z)
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression [48.04180854972225]
We introduce SIRI, Scaling Iterative Reinforcement Learning with Interleaved Compression, a simple yet effective RL approach for Large Reasoning Models (LRMs)<n>We show that this trade-off can be overcome through a training regime that iteratively alternates between compressing and expanding the reasoning budget.<n>Remarkably, we find that after each compression-expansion cycle, the model's performance improves even as its output length decreases.
arXiv Detail & Related papers (2025-09-29T17:59:08Z)
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking [50.97239453902612]
Large Reasoning Models (LRMs) have achieved impressive performance on challenging tasks, yet their deep reasoning often incurs substantial computational costs.<n>Inspired by Evidence Accumulation Models, we find that LRMs have accumulated sufficient information early in reasoning, making further reasoning steps redundant.<n>We propose Just-Enough Thinking (JET), which trains models to proactively terminate unnecessary reasoning.
arXiv Detail & Related papers (2025-09-27T16:25:06Z)
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation [23.385267989706]
Reinforcement learning (RL) has become a key component in training large language reasoning models (LLMs)<n>We propose a simple yet effective strategy via Question Augmentation to reduce problem difficulty and provide more informative learning signals.<n>Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k-particularly on problems where standard RL struggles to make progress.
arXiv Detail & Related papers (2025-07-17T16:21:47Z)
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [68.96619605651155]
Large reasoning models (LRMs) may drastically increase the output length due to overthinking.<n>We propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns.<n>Our method achieves up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
arXiv Detail & Related papers (2025-05-27T20:59:29Z)
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens [51.90059610606049]
This paper revisits the efficiency of such reasoning processes through an information-theoretic lens.<n>We propose two metrics, InfoBias and InfoGain, to quantify divergence from ideal reasoning paths and stepwise information contribution.<n>Motivated by these findings, we introduce an entropy-based Adaptive Think strategy that dynamically halts reasoning once confidence is sufficiently high.
arXiv Detail & Related papers (2025-05-23T13:38:56Z)
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z)
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping [23.626013831589212]
Large Reasoning Models (LRMs) have shown remarkable capabilities in solving complex problems through reinforcement learning (RL)<n>We present a unified framework that formulates various efficient reasoning methods through the lens of length-based reward shaping.<n>Experiments on DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-32B show that our approach significantly enhances both reasoning performance and response length efficiency.
arXiv Detail & Related papers (2025-05-21T15:03:26Z)
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization [86.56120216550232]
We propose a novel two-stage framework for adaptive and efficient reasoning.<n>First, we construct a hybrid reasoning model by merging long and short CoT models.<n>Second, we apply bi-level preference training to guide the model to select suitable reasoning styles.
arXiv Detail & Related papers (2025-04-30T14:01:45Z)
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning [1.0416697066889342]
We propose a simple yet effective reinforcement learning method that enables reasoning models to learn their own optimal CoT lengths without manual supervision.<n>ShorterBetter achieves 50%-80% reduction in output lengths in both in-domain and out-of-domain reasoning tasks.<n>Our reasoning trace analysis shows that ShorterBetter refines the structure of the reasoning traces by reducing unnecessary repetition, excessive self-verification, and over-exploration of alternatives.
arXiv Detail & Related papers (2025-04-30T07:04:19Z)
Concise Reasoning via Reinforcement Learning [13.657506042120167]
We revisit the core principles of reinforcement learning (RL)<n>We uncover a natural correlation between conciseness and accuracy that has been largely overlooked.<n>We show that introducing a secondary phase of RL training, using a very small set of problems, can significantly reduce chains of thought.
arXiv Detail & Related papers (2025-04-07T15:35:54Z)
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models [28.351652568849286]
This paper investigates how the model's context length and the complexity of the training dataset influence the training process of R1-like models.<n>We propose FastCuRL, a curriculum reinforcement learning framework with the progressive context extension strategy.
arXiv Detail & Related papers (2025-03-21T16:35:31Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [98.3430004984531]
We propose Length-Harmonizing Fine-Tuning (O1-Pruner) to minimize reasoning overhead while maintaining accuracy.<n>Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner.
arXiv Detail & Related papers (2025-01-22T01:35:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.