Numerical Optimizations for Weighted Low-rank Estimation on Language
Model
- URL: http://arxiv.org/abs/2211.09718v1
- Date: Wed, 2 Nov 2022 00:58:02 GMT
- Title: Numerical Optimizations for Weighted Low-rank Estimation on Language
Model
- Authors: Ting Hua, Yen-Chang Hsu, Felicity Wang, Qian Lou, Yilin Shen, Hongxia
Jin
- Abstract summary: Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices.
Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption.
We show that our method can perform better than current SOTA methods in neural-based language models.
- Score: 73.12941276331316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Singular value decomposition (SVD) is one of the most popular compression
methods that approximate a target matrix with smaller matrices. However,
standard SVD treats the parameters within the matrix with equal importance,
which is a simple but unrealistic assumption. The parameters of a trained
neural network model may affect task performance unevenly, which suggests
non-equal importance among the parameters. Compared to SVD, the decomposition
method aware of parameter importance is the more practical choice in real
cases. Unlike standard SVD, weighted value decomposition is a non-convex
optimization problem that lacks a closed-form solution. We systematically
investigated multiple optimization strategies to tackle the problem and
examined our method by compressing Transformer-based language models. Further,
we designed a metric to predict when the SVD may introduce a significant
performance drop, for which our method can be a rescue strategy. The extensive
evaluations demonstrate that our method can perform better than current SOTA
methods in compressing Transformer-based language models.
Related papers
- Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation [53.88562288388169]
A common strategy for.
Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks.
We propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix.
SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix.
arXiv Detail & Related papers (2024-10-30T12:08:30Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values [12.137869917556415]
Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks.
fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments.
We propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to low-rank matrices using critical singular values as trainable parameters.
arXiv Detail & Related papers (2024-09-09T08:44:53Z) - Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction.
We find that our resulting task accuracy is much closer to the original model's performance.
Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z) - Large-Scale System Identification Using a Randomized SVD [4.567810220723372]
We show that an approximate matrix factorization can replace the standard SVD in the realization algorithm.
This is the only method capable of producing a model.
arXiv Detail & Related papers (2021-09-06T19:25:15Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Direction is what you need: Improving Word Embedding Compression in
Large Language Models [7.736463504706344]
This paper presents a novel loss objective to compress token embeddings in Transformer-based models by leveraging an AutoEncoder architecture.
Our method significantly outperforms the commonly used SVD-based matrix-factorization approach in terms of initial language model Perplexity.
arXiv Detail & Related papers (2021-06-15T14:28:00Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.