DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
- URL: http://arxiv.org/abs/2411.14055v1
- Date: Thu, 21 Nov 2024 12:02:39 GMT
- Title: DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
- Authors: Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu,
- Abstract summary: Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs.
We propose DRPruning, which incorporates distributionally robust optimization to restore balanced performance across domains.
- Score: 61.492590008258986
- License:
- Abstract: Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs. Structured pruning reduces model size and speeds up inference but often causes uneven degradation across domains, leading to biased performance. To address this, we propose DRPruning, which incorporates distributionally robust optimization to restore balanced performance across domains, along with further improvements to enhance robustness. Experiments in monolingual and multilingual settings show that our method surpasses similarly sized models in pruning and continued pretraining over perplexity, downstream tasks, and instruction tuning. We further provide analysis demonstrating the robustness of our method towards various domains and distribution shifts. Furthermore, our method automatically determines optimal reference losses and data ratios, suggesting potential for broader applications. Our code is available at https://github.com/hexuandeng/DRPruning.
Related papers
- OPTISHEAR: Towards Efficient and Adaptive Pruning of Large Language Models via Evolutionary Optimization [18.57876883968734]
We introduce textbftextscOptiShear, an efficient evolutionary optimization framework for adaptive LLM pruning.
Our framework features two key innovations: an effective search space built on our Meta pruning metric, and a model-wise reconstruction error for rapid evaluation.
arXiv Detail & Related papers (2025-02-15T09:17:38Z) - Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model.
In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction.
Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z) - Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - FTP: A Fine-grained Token-wise Pruner for Large Language Models via Token Routing [17.01412432658081]
Large language models (LLMs) have demonstrated superior performance across various tasks by adhering to scaling laws.
We propose a fine-grained token-wise pruning approach for the LLMs, which presents a learnable router to adaptively identify the less important tokens.
Our approach achieves state-of-the-art (SOTA) pruning results, surpassing other existing pruning methods.
arXiv Detail & Related papers (2024-12-16T07:09:46Z) - Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval [32.104911827710936]
We propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for Large Language Model-based Dense Retrieval fine-tuning.
The tDRO parameterizes the domain weights and updates them with scaled domain gradients.
Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage.
arXiv Detail & Related papers (2024-08-20T07:48:19Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - Fine-Tuning Pre-Trained Language Models Effectively by Optimizing
Subnetworks Adaptively [32.001304911395756]
We propose a Dynamic Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning.
Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability.
arXiv Detail & Related papers (2022-11-03T08:32:12Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.