Pruning Pre-trained Language Models with Principled Importance and
Self-regularization
- URL: http://arxiv.org/abs/2305.12394v1
- Date: Sun, 21 May 2023 08:15:12 GMT
- Title: Pruning Pre-trained Language Models with Principled Importance and
Self-regularization
- Authors: Siyu Ren, Kenny Q. Zhu
- Abstract summary: Iterative pruning is one of the most effective compression methods for pre-trained language models.
We propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning.
Our experiments on natural language understanding, question-answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels.
- Score: 18.088550230146247
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Iterative pruning is one of the most effective compression methods for
pre-trained language models. We discovered that finding the optimal pruning
decision is an equality-constrained 0-1 Integer Linear Programming problem. The
solution to this optimization problem leads to a principled importance
criterion which we use to rank parameters during iterative model pruning. To
mitigate the poor generalization at high sparsity levels, we propose a
self-regularization scheme where model prediction is regularized by the latest
checkpoint with increasing sparsity throughout pruning. Our experiments on
natural language understanding, question-answering, named entity recognition,
and data-to-text generation with various Transformer-based PLMs show the
effectiveness of the approach at various sparsity levels.
Related papers
- Sparse Bayesian Generative Modeling for Compressive Sensing [8.666730973498625]
This work addresses the fundamental linear inverse problem in compressive sensing (CS) by introducing a new type of regularizing generative prior.
We support our approach theoretically through the concept of variational inference and validate it empirically using different types of compressible signals.
arXiv Detail & Related papers (2024-11-14T14:37:47Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Boosting Fair Classifier Generalization through Adaptive Priority Reweighing [59.801444556074394]
A performance-promising fair algorithm with better generalizability is needed.
This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability.
arXiv Detail & Related papers (2023-09-15T13:04:55Z) - Deep-learning-based Early Fixing for Gas-lifted Oil Production
Optimization: Supervised and Weakly-supervised Approaches [7.676408770854476]
Mixed-Integer Linear Programs (MILPs) are used to maximize oil production from gas-lifted oil wells.
We propose a tailor-made solution based on deep learning models trained to provide values to all integer variables.
arXiv Detail & Related papers (2023-09-01T01:23:28Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations.
We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z) - Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task.
We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model.
Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z) - Integrated Optimization of Predictive and Prescriptive Tasks [0.0]
We propose a new framework directly integrating predictive tasks under prescriptive tasks.
We train the parameters of predictive algorithm within a prescription problem via bilevel optimization techniques.
arXiv Detail & Related papers (2021-01-02T02:43:10Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.