Numerical Optimizations for Weighted Low-rank Estimation on Language
Model
- URL: http://arxiv.org/abs/2211.09718v1
- Date: Wed, 2 Nov 2022 00:58:02 GMT
- Title: Numerical Optimizations for Weighted Low-rank Estimation on Language
Model
- Authors: Ting Hua, Yen-Chang Hsu, Felicity Wang, Qian Lou, Yilin Shen, Hongxia
Jin
- Abstract summary: Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices.
Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption.
We show that our method can perform better than current SOTA methods in neural-based language models.
- Score: 73.12941276331316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Singular value decomposition (SVD) is one of the most popular compression
methods that approximate a target matrix with smaller matrices. However,
standard SVD treats the parameters within the matrix with equal importance,
which is a simple but unrealistic assumption. The parameters of a trained
neural network model may affect task performance unevenly, which suggests
non-equal importance among the parameters. Compared to SVD, the decomposition
method aware of parameter importance is the more practical choice in real
cases. Unlike standard SVD, weighted value decomposition is a non-convex
optimization problem that lacks a closed-form solution. We systematically
investigated multiple optimization strategies to tackle the problem and
examined our method by compressing Transformer-based language models. Further,
we designed a metric to predict when the SVD may introduce a significant
performance drop, for which our method can be a rescue strategy. The extensive
evaluations demonstrate that our method can perform better than current SOTA
methods in compressing Transformer-based language models.
Related papers
- Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Partial Least Square Regression via Three-factor SVD-type Manifold
Optimization for EEG Decoding [4.0204191666595595]
We propose a new method to solve the partial least square regression, named PLSR via optimization on bi-Grassmann manifold (PLSRbiGr)
qlPLSRbiGr is validated with a variety of experiments for decoding EEG signals at motor imagery (MI) and steady-state visual evoked potential (SSVEP) task.
arXiv Detail & Related papers (2022-08-09T11:57:02Z) - Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction.
We find that our resulting task accuracy is much closer to the original model's performance.
Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z) - Large-Scale System Identification Using a Randomized SVD [4.567810220723372]
We show that an approximate matrix factorization can replace the standard SVD in the realization algorithm.
This is the only method capable of producing a model.
arXiv Detail & Related papers (2021-09-06T19:25:15Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Direction is what you need: Improving Word Embedding Compression in
Large Language Models [7.736463504706344]
This paper presents a novel loss objective to compress token embeddings in Transformer-based models by leveraging an AutoEncoder architecture.
Our method significantly outperforms the commonly used SVD-based matrix-factorization approach in terms of initial language model Perplexity.
arXiv Detail & Related papers (2021-06-15T14:28:00Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.