ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
- URL: http://arxiv.org/abs/2504.01296v1
- Date: Wed, 02 Apr 2025 01:59:26 GMT
- Title: ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
- Authors: Bairu Hou, Yang Zhang, Jiabao Ji, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang,
- Abstract summary: We present ThinkPrune, a simple yet effective method for pruning the thinking length for long-thinking LLMs.<n>We show that ThinkPrune results in a remarkable performance-length tradeoff -- on the AIME24 dataset, the reasoning length of DeepSeek-R1-Distill-Qwen-1.5B can be reduced by half with only 2% drop in performance.
- Score: 68.02825465552779
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ThinkPrune, a simple yet effective method for pruning the thinking length for long-thinking LLMs, which has been found to often produce inefficient and redundant thinking processes. Existing preliminary explorations of reducing thinking length primarily focus on forcing the thinking process to early exit, rather than adapting the LLM to optimize and consolidate the thinking process, and therefore the length-performance tradeoff observed so far is sub-optimal. To fill this gap, ThinkPrune offers a simple solution that continuously trains the long-thinking LLMs via reinforcement learning (RL) with an added token limit, beyond which any unfinished thoughts and answers will be discarded, resulting in a zero reward. To further preserve model performance, we introduce an iterative length pruning approach, where multiple rounds of RL are conducted, each with an increasingly more stringent token limit. We observed that ThinkPrune results in a remarkable performance-length tradeoff -- on the AIME24 dataset, the reasoning length of DeepSeek-R1-Distill-Qwen-1.5B can be reduced by half with only 2% drop in performance. We also observed that after pruning, the LLMs can bypass unnecessary steps while keeping the core reasoning process complete. Code is available at https://github.com/UCSB-NLP-Chang/ThinkPrune.
Related papers
- Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs [52.663816303997194]
A key factor influencing answer quality is the length of the thinking stage.<n>This paper explores and exploits the mechanisms by which LLMs understand and regulate the length of their reasoning.<n>Our results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency.
arXiv Detail & Related papers (2025-06-08T17:54:33Z) - Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [52.32193550674408]
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL)<n>We propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually.<n>E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B)
arXiv Detail & Related papers (2025-06-07T02:41:54Z) - Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning [45.807019099421225]
Reasoning large language models (LLMs) rely on scaling test-time compute to perform complex reasoning tasks.<n>We demonstrate that shorter reasoning chains within individual questions are significantly more likely to yield correct answers.<n>We then observe that training on the shorter ones leads to better performance.
arXiv Detail & Related papers (2025-05-23T12:29:06Z) - Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately [29.018731931275138]
Large Language Models (LLMs) can gain better capabilities by generating Chain-of-Thought reasoning to respond a given request.<n>However, when incorporating the two scaling dimensions, the system efficiency is dampened significantly for two reasons.<n>We present SART, a serving framework for efficient and accurate LLM reasoning.
arXiv Detail & Related papers (2025-05-19T16:34:56Z) - Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning [12.830215971176806]
Long chain-of-thought (CoT) is an emerging strategy to improve the reasoning efficiency of large language models (LLMs)<n>We propose a theoretically bounded metric to measure the effectiveness and efficiency of different thoughts.<n>We then propose Long$otimes$Short, an efficient reasoning framework that enables two LLMs to collaboratively solve the problem.
arXiv Detail & Related papers (2025-05-17T04:26:39Z) - Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs [52.405085773954596]
We find that large language models (LLMs) tend to overthink simple problems, generating unnecessarily long outputs, and underthink harder ones.
This indicates that models might misjudge problem difficulty and fail to calibrate their response length appropriately.
Experiments show that the generation length can be significantly reduced while maintaining acceptable accuracy.
arXiv Detail & Related papers (2025-04-30T18:48:06Z) - Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning [13.865037985388575]
We introduce a new emphprocess prejudge strategy in LLM reasoning.
We define a prejudge node in the rationale, which represents a reasoning step.
We present an automated reasoning framework with a dynamic tree-searching strategy.
arXiv Detail & Related papers (2025-04-18T06:42:30Z) - Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [60.67176246634741]
We formalize the problem of optimizing test-time compute as a meta-reinforcement learning (RL) problem.<n>We show that state-of-the-art models do not minimize regret, but one can do so by maximizing a dense reward bonus in conjunction with the outcome 0/1 reward RL.
arXiv Detail & Related papers (2025-03-10T17:40:43Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [98.3430004984531]
We propose Length-Harmonizing Fine-Tuning (O1-Pruner) to minimize reasoning overhead while maintaining accuracy.<n>Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner.
arXiv Detail & Related papers (2025-01-22T01:35:11Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.