ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations
- URL: http://arxiv.org/abs/2506.02818v1
- Date: Tue, 03 Jun 2025 12:47:23 GMT
- Title: ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations
- Authors: Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba,
- Abstract summary: Large language models (LLMs) demonstrate impressive results in natural language processing tasks.<n>Structured matrix representations are a promising way for reducing the number of parameters of these models.<n>We utilize the fact that LLM output is invariant under certain transformations of weight matrices.<n>This insight can be leveraged to identify transformations that significantly improve the compressibility of weights within structured classes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) demonstrate impressive results in natural language processing tasks but require a significant amount of computational and memory resources. Structured matrix representations are a promising way for reducing the number of parameters of these models. However, it seems unrealistic to expect that weight matrices of pretrained models can be accurately represented by structured matrices without any fine-tuning. To overcome this issue, we utilize the fact that LLM output is invariant under certain orthogonal transformations of weight matrices. This insight can be leveraged to identify transformations that significantly improve the compressibility of weights within structured classes. The proposed approach is applicable to various types of structured matrices that support efficient projection operations. Code is available at https://github.com/GrishKate/ProcrustesGPT
Related papers
- Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy [57.54306942529943]
We propose an Approximately Orthogonal Fine-Tuning (AOFT) strategy for representing the low-rank weight matrices.<n>Our method achieves competitive performance across a range of downstream image classification tasks.
arXiv Detail & Related papers (2025-07-17T16:09:05Z) - MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers [43.39466934693055]
We present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective.
This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers.
We conduct extensive experiments on various benchmarks to demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2024-11-20T02:41:53Z) - Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation [53.88562288388169]
A common strategy for.
Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks.
We propose a novel PEFT approach inspired by Singular Value Decomposition (SVD) for representing the adaptation matrix.
SVD decomposes a matrix into the product of a left unitary matrix, a diagonal matrix of scaling values, and a right unitary matrix.
arXiv Detail & Related papers (2024-10-30T12:08:30Z) - BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference [15.519068157865023]
We introduce the Block-Level Adaptive STructured (BLAST) matrix to learn and leverage efficient structures prevalent in the weight matrices of linear layers within deep learning models.
We demonstrate the efficiency of using the matrix for compressing both language and vision tasks.
arXiv Detail & Related papers (2024-10-28T17:56:18Z) - Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization [27.907707931902547]
We study two phenomena related to the attention mechanism during the fine-tuning of Large Language Models.<n>We propose a new strategy that improves fine-tuning efficiency in terms of both storage and time.
arXiv Detail & Related papers (2024-10-03T06:37:37Z) - Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices.
We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data.
We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z) - Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain.
We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance.
We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z) - Monarch: Expressive Structured Matrices for Efficient and Accurate
Training [64.6871423399431]
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones.
We propose a class of matrices (Monarch) that is hardware-efficient.
arXiv Detail & Related papers (2022-04-01T17:37:29Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Kaleidoscope: An Efficient, Learnable Representation For All Structured
Linear Maps [20.151950843660973]
We introduce kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space.
K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures.
We use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
arXiv Detail & Related papers (2020-12-29T22:51:29Z) - FLAMBE: Structural Complexity and Representation Learning of Low Rank
MDPs [53.710405006523274]
This work focuses on the representation learning question: how can we learn such features?
Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem.
We develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
arXiv Detail & Related papers (2020-06-18T19:11:18Z) - Sketching Transformed Matrices with Applications to Natural Language
Processing [76.6222695417524]
We propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix.
We show that our approach obtains small error and is efficient in both space and time.
arXiv Detail & Related papers (2020-02-23T03:07:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.