Compressing Large Language Models with PCA Without Performance Loss
- URL: http://arxiv.org/abs/2508.04307v1
- Date: Wed, 06 Aug 2025 10:47:22 GMT
- Title: Compressing Large Language Models with PCA Without Performance Loss
- Authors: Magnus Bengtsson,
- Abstract summary: We show that Principal Component Analysis enables extreme compression of neural models without sacrificing performance.<n>A one-layer classifier trained on PCA-compressed polar MNIST achieves over 98 percent accuracy using only 840 parameters.<n>A two-layer transformer trained on 70-dimensional PCA-reduced MiniLM embeddings reaches 76.62 percent accuracy on the 20 Newsgroups dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate that Principal Component Analysis (PCA), when applied in a structured manner, either to polar-transformed images or segment-wise to token sequences, enables extreme compression of neural models without sacrificing performance. Across three case studies, we show that a one-layer classifier trained on PCA-compressed polar MNIST achieves over 98 percent accuracy using only 840 parameters. A two-layer transformer trained on 70-dimensional PCA-reduced MiniLM embeddings reaches 76.62 percent accuracy on the 20 Newsgroups dataset with just 81000 parameters. A decoder-only transformer generates coherent token sequences from 70-dimensional PCA embeddings while preserving over 97 percent cosine similarity with full MiniLM representations, using less than 17 percent of the parameter count of GPT-2. These results highlight PCA-based input compression as a general and effective strategy for aligning model capacity with information content, enabling lightweight architectures across multiple modalities.
Related papers
- Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning [6.346469177254699]
We propose a framework for structured weight sharing across transformer layers.<n>Inspired by dictionary learning in CNNs, we propose a framework for structured weight sharing across transformer layers.<n>Our approach decomposes attention projection matrices into shared dictionary atoms, reducing the attention module's parameters by 66.7%.
arXiv Detail & Related papers (2025-08-06T16:06:43Z) - Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models.<n>We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z) - Krony-PT: GPT2 compressed with Kronecker Products [0.6372911857214884]
We introduce Krony-PT, a compression technique of GPT2 citepradford 2019 based on Kronecker Products.<n>We specifically target the layers of the original transformer layer, and systematically compress the feed forward layer to various degrees.
arXiv Detail & Related papers (2024-12-16T20:44:01Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models.
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z) - Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells [38.647921189039934]
This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of organic solar cells (OSCs)<n>We generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days.<n>The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE.
arXiv Detail & Related papers (2024-03-29T22:05:26Z) - Variator: Accelerating Pre-trained Models with Plug-and-Play Compression
Modules [111.98205411431402]
Variator is a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins.
We show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%.
arXiv Detail & Related papers (2023-10-24T11:00:07Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Test-Time Adaptation with Principal Component Analysis [1.0323063834827415]
We propose a Test-Time Adaptation with Principal Component Analysis (TTAwPCA)
TTAwPCA combines three components: the output of a given layer is using a Principal Component Analysis (PCA), filtered by a penalization of its singular values, and reconstructed with the PCA inverse transform.
Experiments on CIFAR-10-C and CIFAR- 100-C demonstrate the effectiveness and limits of our method.
arXiv Detail & Related papers (2022-09-13T07:24:40Z) - Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for
Natural Language Understanding [20.75335227098455]
Large pre-trained Transformer networks have demonstrated dramatic improvements in many natural language understanding tasks.
New hardware supporting both NM semi-structured sparsity and low-precision integer computation is a promising solution to boost model serving efficiency.
We propose a flexible compression framework NxMiFormer that performs simultaneous sparsification and quantization.
arXiv Detail & Related papers (2022-06-30T04:33:50Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.