Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models
- URL: http://arxiv.org/abs/2511.18864v1
- Date: Mon, 24 Nov 2025 08:08:19 GMT
- Title: Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models
- Authors: Yang Xiang, Yixin Ji, Juntao Li, Min Zhang,
- Abstract summary: We show that using self-generated reasoning data for calibration can substantially improve pruning performance.<n>Our analysis reveals that challenging and moderately long self-generated reasoning data serve as ideal calibration data.
- Score: 48.973207827896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning benchmarks. However, their long chain-of-thought reasoning processes incur significant inference overhead. Pruning has emerged as a promising approach to reducing computational costs. However, existing efforts have primarily focused on large language models (LLMs), while pruning LRMs remains unexplored. In this work, we conduct the first empirical study on pruning LRMs and show that directly applying existing pruning techniques fails to yield satisfactory results. Our findings indicate that using self-generated reasoning data for calibration can substantially improve pruning performance. We further investigate how the difficulty and length of reasoning data affect pruning outcomes. Our analysis reveals that challenging and moderately long self-generated reasoning data serve as ideal calibration data. Based on these insights, we propose a Selective Self-Generated Reasoning (SSGR) data construction strategy to provide effective calibration data for pruning LRMs. Experimental results on the DeepSeek-R1-Distill model series validate that our strategy improves the reasoning ability of pruned LRMs by 10%-13% compared to general pruning methods.
Related papers
- Observationally Informed Adaptive Causal Experimental Design [55.998153710215654]
We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
arXiv Detail & Related papers (2026-03-04T06:52:37Z) - Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning [59.76691952347156]
Reinforcement learning (RL) has emerged as a powerful framework for improving the reasoning capabilities of large language models (LLMs)<n>Most existing RL approaches rely on sparse outcome rewards, which fail to credit correct intermediate steps in partially successful solutions.<n>We propose Verifiable Prefix Policy Optimization (VPPO), which uses PRMs only to localize the first error during RL.
arXiv Detail & Related papers (2026-01-26T21:38:20Z) - From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models [17.998434546981738]
Large language models (LLMs) are increasingly costly to deploy, motivating extensive research on model pruning.<n>We conduct a controlled study of pruning for both instruction-following ($textbfLLM-instruct$) and reasoning-augmented ($textbfLLM-think$) models.<n>We evaluate static depth pruning, static width pruning, and dynamic pruning across 17 tasks spanning classification, generation, and reasoning.
arXiv Detail & Related papers (2026-01-26T03:01:39Z) - Learning to Reason in LLMs by Expectation Maximization [55.721496945401846]
We formalize reasoning as a latent variable model and derive an expectation-maximization objective for learning to reason.<n>This view connects EM and modern reward-based optimization, and shows that the main challenge lies in designing a sampling distribution that generates rationales that justify correct answers.
arXiv Detail & Related papers (2025-12-23T08:56:49Z) - Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models [31.422773877490613]
Reasoning LLMs (RLMs) deliver strong multi-step reasoning through chain-of-thought generation.<n>RLMs' large model sizes and lengthy decode-time outputs make them costly to deploy and unsuitable for resource-constrained settings.<n>We introduce RESP, a structured pruning framework that aligns pruning decisions with the model's reasoning dynamics.
arXiv Detail & Related papers (2025-12-01T20:27:05Z) - Efficient Reasoning via Reward Model [24.105621725286497]
Reinforcement learning with verifiable rewards (RLVR) has been shown to enhance the reasoning capabilities of large language models (LLMs)<n>LRMs such as DeepSeek-R1 and OpenAI o1 often generate verbose responses containing redundant or irrelevant reasoning step-a phenomenon known as overthinking.<n>We introduce a novel reward formulation named Conciseness Reward Function (CRF) with explicit dependency between the outcome reward and conciseness score.
arXiv Detail & Related papers (2025-11-12T09:51:07Z) - LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling [39.61818305829112]
We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step.<n>PIR identifies and selectively prunes only low-importance functional steps while preserving progressive reasoning components.<n>Our approach demonstrates strong generalizability across different model sizes, data sources, and token budgets.
arXiv Detail & Related papers (2025-05-25T15:17:57Z) - Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning [49.21525229904197]
We propose a novel data annotation method for PRMs specifically designed to score the long CoT reasoning process.<n>We introduce the concepts of Error Propagation and Error Cessation, enhancing PRMs' ability to identify both effective self-correction behaviors and reasoning based on erroneous steps.<n>Our PRM achieves superior performance across various metrics, including search guidance, BoN, and F1 scores.
arXiv Detail & Related papers (2025-05-20T14:12:05Z) - R-PRM: Reasoning-Driven Process Reward Modeling [53.06844294668382]
Process Reward Models (PRMs) have emerged as a promising solution by evaluating each reasoning step.<n>Existing PRMs typically output evaluation scores directly, limiting both learning efficiency and evaluation accuracy.<n>We propose Reasoning-Driven Process Reward Modeling (R-PRM)<n>R-PRM generates seed data from limited annotations, effectively bootstrapping our model's reasoning capabilities.
arXiv Detail & Related papers (2025-03-27T09:23:08Z) - Sample-aware Adaptive Structured Pruning for Large Language Models [14.605017410864583]
This study introduces AdaPruner, a sample-aware adaptive structured pruning framework for large language models (LLMs)<n>Specifically, AdaPruner effectively removes redundant parameters from LLMs by constructing a structured pruning solution space.<n>At a 20% pruning ratio, the model pruned with AdaPruner maintains 97% of the performance of the unpruned model.
arXiv Detail & Related papers (2025-03-08T12:00:21Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.