Enabling Lightweight Fine-tuning for Pre-trained Language Model
Compression based on Matrix Product Operators
- URL: http://arxiv.org/abs/2106.02205v1
- Date: Fri, 4 Jun 2021 01:50:15 GMT
- Title: Enabling Lightweight Fine-tuning for Pre-trained Language Model
Compression based on Matrix Product Operators
- Authors: Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Z.Y. Xie, Zhong-Yi Lu, Ji-Rong
Wen
- Abstract summary: We present a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics.
Our approach can be applied to the original or the compressed PLMs in a general way, which derives a lighter network and significantly reduces the parameters to be fine-tuned.
- Score: 31.461762905053426
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper presents a novel pre-trained language models (PLM) compression
approach based on the matrix product operator (short as MPO) from quantum
many-body physics. It can decompose an original matrix into central tensors
(containing the core information) and auxiliary tensors (with only a small
proportion of parameters). With the decomposed MPO structure, we propose a
novel fine-tuning strategy by only updating the parameters from the auxiliary
tensors, and design an optimization algorithm for MPO-based approximation over
stacked network architectures. Our approach can be applied to the original or
the compressed PLMs in a general way, which derives a lighter network and
significantly reduces the parameters to be fine-tuned. Extensive experiments
have demonstrated the effectiveness of the proposed approach in model
compression, especially the reduction in finetuning parameters (91% reduction
on average).
Related papers
- Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation [12.07880147193174]
We show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of over parameterization without the computational burdens.
We demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models.
arXiv Detail & Related papers (2024-06-06T14:29:49Z) - Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.
We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - Efficient Compression of Overparameterized Deep Models through
Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models.
Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z) - Low-Rank Prune-And-Factorize for Language Model Compression [18.088550230146247]
Matrix factorization fails to retain satisfactory performance under moderate to high compression rate.
We propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning.
arXiv Detail & Related papers (2023-06-25T07:38:43Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Numerical Optimizations for Weighted Low-rank Estimation on Language
Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices.
Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption.
We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z) - Spectral Tensor Train Parameterization of Deep Learning Layers [136.4761580842396]
We study low-rank parameterizations of weight matrices with embedded spectral properties in the Deep Learning context.
We show the effects of neural network compression in the classification setting and both compression and improved stability training in the generative adversarial training setting.
arXiv Detail & Related papers (2021-03-07T00:15:44Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.