Related papers: Frustratingly Easy Task-aware Pruning for Large Language Models

Frustratingly Easy Task-aware Pruning for Large Language Models

URL: http://arxiv.org/abs/2510.22489v1
Date: Sun, 26 Oct 2025 02:09:22 GMT
Title: Frustratingly Easy Task-aware Pruning for Large Language Models
Authors: Yuanhe Tian, Junjie Liu, Xican Yang, Haishan Ye, Yan Song,
Abstract summary: We propose a simple yet effective pruning approach for large language models (LLMs)<n>Our framework computes separate importance scores using both general and task-specific calibration data.<n> Experiments on widely used benchmarks demonstrate that our approach is effective and consistently outperforms the baselines.
Score: 33.84349099489764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often ranks the importance of LLM parameters using their magnitudes and calibration-data activations and removes (or masks) the less important ones, accordingly reducing LLMs' size. However, these approaches primarily focus on preserving the LLM's ability to generate fluent sentences, while neglecting performance on specific domains and tasks. In this paper, we propose a simple yet effective pruning approach for LLMs that preserves task-specific capabilities while shrinking their parameter space. We first analyze how conventional pruning minimizes loss perturbation under general-domain calibration and extend this formulation by incorporating task-specific feature distributions into the importance computation of existing pruning algorithms. Thus, our framework computes separate importance scores using both general and task-specific calibration data, partitions parameters into shared and exclusive groups based on activation-norm differences, and then fuses their scores to guide the pruning process. This design enables our method to integrate seamlessly with various foundation pruning techniques and preserve the LLM's specialized abilities under compression. Experiments on widely used benchmarks demonstrate that our approach is effective and consistently outperforms the baselines with identical pruning ratios and different settings.

Related papers

SlimLLM: Accurate Structured Pruning for Large Language Models [36.84275777364218]
structured pruning is an effective solution to compress the parameters of large language models.<n>We propose an effective and fast structured pruning method named SlimLLM for large language models.
arXiv Detail & Related papers (2025-05-28T03:01:28Z)
How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve [2.33361323991006]
Large Language Models (LLMs) are increasingly deployed for narrow tasks in resource-constrained settings.<n>We present LLM-Sieve, a framework that prunes LLMs down to the minimal parameter subset needed to preserve task performance.
arXiv Detail & Related papers (2025-05-23T20:17:20Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [65.23593936798662]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
Enhancing LLMs with Smart Preprocessing for EHR Analysis [3.5839042822277585]
Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing.<n>This paper introduces a compact LLM framework optimized for local deployment in environments with stringent privacy requirements.
arXiv Detail & Related papers (2024-12-03T22:06:55Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
Bridging LLMs and KGs without Fine-Tuning: Intermediate Probing Meets Subgraph-Aware Entity Descriptions [49.36683223327633]
Large Language Models (LLMs) encapsulate extensive world knowledge and exhibit powerful context modeling capabilities.<n>We propose a novel framework that synergizes the strengths of LLMs with robust knowledge representation to enable effective and efficient KGC.<n>We achieve a 47% relative improvement over previous methods based on non-fine-tuned LLMs and, to our knowledge, are the first to achieve classification performance comparable to fine-tuned LLMs.
arXiv Detail & Related papers (2024-08-13T10:15:55Z)
Beyond KV Caching: Shared Attention for Efficient LLMs [5.801044612920816]
This paper introduces a novel Shared Attention (SA) mechanism to enhance the efficiency of large language models (LLMs) Our approach utilizes the isotropic tendencies of attention distributions observed in advanced LLMs post-pretraining to reduce both the computational flops and the size of the KV cache required during inference. Our findings suggest that SA not only conserves computational resources but also maintains robust model performance, thereby facilitating the deployment of more efficient LLMs in resource-constrained environments.
arXiv Detail & Related papers (2024-07-13T07:23:07Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)<n>We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Pruning as a Domain-specific LLM Extractor [44.81262364608468]
Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. Few efforts have explored model pruning techniques to reduce the size of LLMs. This work introduces an innovative unstructured dual-pruning methodology, D-Pruner, for domain-specific compression on LLM.
arXiv Detail & Related papers (2024-05-10T07:05:02Z)
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models [42.95555008229016]
We propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50% sparsity without the need of any retraining. The advantages of the proposed method exhibit even more when the sparsity is extremely high.
arXiv Detail & Related papers (2023-10-14T05:43:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.