Related papers: Vision Transformer Pruning Via Matrix Decomposition

Vision Transformer Pruning Via Matrix Decomposition

URL: http://arxiv.org/abs/2308.10839v1
Date: Mon, 21 Aug 2023 16:40:51 GMT
Title: Vision Transformer Pruning Via Matrix Decomposition
Authors: Tianyi Sun
Abstract summary: The purpose of Vision Transformer Pruning is to prune the dimension of the linear projection of the dataset by learning their associated importance score. In this paper we further reduce dimension and complexity of the linear projection by implementing and comparing several matrix decomposition methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This is a further development of Vision Transformer Pruning via matrix decomposition. The purpose of the Vision Transformer Pruning is to prune the dimension of the linear projection of the dataset by learning their associated importance score in order to reduce the storage, run-time memory, and computational demands. In this paper we further reduce dimension and complexity of the linear projection by implementing and comparing several matrix decomposition methods while preserving the generated important features. We end up selected the Singular Value Decomposition as the method to achieve our goal by comparing the original accuracy scores in the original Github repository and the accuracy scores of using those matrix decomposition methods, including Singular Value Decomposition, four versions of QR Decomposition, and LU factorization.

Related papers

Post-processing for Fair Regression via Explainable SVD [6.882042556551613]
We propose a linear transformation of the weight matrix, whereby the singular values derived from the SVD correspond to the differences in the first and second moments of the output distributions across two groups. We analytically solve the problem of finding the optimal weights under these constraints. Experimental validation on various datasets demonstrates that our method achieves a similar or superior fairness-accuracy trade-off compared to the baselines.
arXiv Detail & Related papers (2025-04-04T00:10:01Z)
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation [53.88562288388169]
A common strategy for. Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks. We propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix. SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix.
arXiv Detail & Related papers (2024-10-30T12:08:30Z)
An Alternative Graphical Lasso Algorithm for Precision Matrices [0.0]
We present a new/improved (dual-primal) DP-GLasso algorithm for estimating sparse precision matrices. We show that the regularized normal log-likelihood naturally decouples into a sum of two easy to minimize convex functions one of which is a Lasso regression problem. Our algorithm has the precision matrix as its optimization target right at the outset, and retains all the favorable properties of the DP-GLasso algorithm.
arXiv Detail & Related papers (2024-03-19T02:01:01Z)
Learning Unorthogonalized Matrices for Rotation Estimation [83.94986875750455]
Estimating 3D rotations is a common procedure for 3D computer vision. One form of representation -- rotation matrices -- is popular due to its continuity. We propose unorthogonalized Pseudo' Rotation Matrices (PRoM)
arXiv Detail & Related papers (2023-12-01T09:56:29Z)
Optimal Projections for Discriminative Dictionary Learning using the JL-lemma [0.5461938536945723]
Dimensionality reduction-based dictionary learning methods have often used iterative random projections. This paper proposes a constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma.
arXiv Detail & Related papers (2023-08-27T02:59:59Z)
Numerical Optimizations for Weighted Low-rank Estimation on Language Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices. Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z)
A Validation Approach to Over-parameterized Matrix and Image Recovery [29.29430943998287]
We consider the problem of recovering a low-rank matrix from several random linear measurements. We show that the proposed validation approach can also be efficiently used for image prior, an an image with a deep network.
arXiv Detail & Related papers (2022-09-21T22:01:23Z)
Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z)
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R [45.24529956312764]
This paper describes an R package named flare, which implements a family of new high dimensional regression methods. The package flare is coded in double precision C, and called from R by a user-friendly interface. Experiments show that flare is efficient and can scale up to large problems.
arXiv Detail & Related papers (2020-06-27T18:01:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.