Related papers: Compressing Large Language Models with PCA Without Performance Loss

Compressing Large Language Models with PCA Without Performance Loss

URL: http://arxiv.org/abs/2508.04307v1
Date: Wed, 06 Aug 2025 10:47:22 GMT
Title: Compressing Large Language Models with PCA Without Performance Loss
Authors: Magnus Bengtsson,
Abstract summary: We show that Principal Component Analysis enables extreme compression of neural models without sacrificing performance.<n>A one-layer classifier trained on PCA-compressed polar MNIST achieves over 98 percent accuracy using only 840 parameters.<n>A two-layer transformer trained on 70-dimensional PCA-reduced MiniLM embeddings reaches 76.62 percent accuracy on the 20 Newsgroups dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We demonstrate that Principal Component Analysis (PCA), when applied in a structured manner, either to polar-transformed images or segment-wise to token sequences, enables extreme compression of neural models without sacrificing performance. Across three case studies, we show that a one-layer classifier trained on PCA-compressed polar MNIST achieves over 98 percent accuracy using only 840 parameters. A two-layer transformer trained on 70-dimensional PCA-reduced MiniLM embeddings reaches 76.62 percent accuracy on the 20 Newsgroups dataset with just 81000 parameters. A decoder-only transformer generates coherent token sequences from 70-dimensional PCA embeddings while preserving over 97 percent cosine similarity with full MiniLM representations, using less than 17 percent of the parameter count of GPT-2. These results highlight PCA-based input compression as a general and effective strategy for aligning model capacity with information content, enabling lightweight architectures across multiple modalities.

Related papers

Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning [6.346469177254699]
We propose a framework for structured weight sharing across transformer layers.<n>Inspired by dictionary learning in CNNs, we propose a framework for structured weight sharing across transformer layers.<n>Our approach decomposes attention projection matrices into shared dictionary atoms, reducing the attention module's parameters by 66.7%.
arXiv Detail & Related papers (2025-08-06T16:06:43Z)
Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models.<n>We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
Krony-PT: GPT2 compressed with Kronecker Products [0.6372911857214884]
We introduce Krony-PT, a compression technique of GPT2 citepradford 2019 based on Kronecker Products.<n>We specifically target the layers of the original transformer layer, and systematically compress the feed forward layer to various degrees.
arXiv Detail & Related papers (2024-12-16T20:44:01Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models. SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z)
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells [38.647921189039934]
This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of organic solar cells (OSCs)<n>We generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days.<n>The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE.
arXiv Detail & Related papers (2024-03-29T22:05:26Z)
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules [111.98205411431402]
Variator is a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. We show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%.
arXiv Detail & Related papers (2023-10-24T11:00:07Z)
Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes. We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z)
Test-Time Adaptation with Principal Component Analysis [1.0323063834827415]
We propose a Test-Time Adaptation with Principal Component Analysis (TTAwPCA) TTAwPCA combines three components: the output of a given layer is using a Principal Component Analysis (PCA), filtered by a penalization of its singular values, and reconstructed with the PCA inverse transform. Experiments on CIFAR-10-C and CIFAR- 100-C demonstrate the effectiveness and limits of our method.
arXiv Detail & Related papers (2022-09-13T07:24:40Z)
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding [20.75335227098455]
Large pre-trained Transformer networks have demonstrated dramatic improvements in many natural language understanding tasks. New hardware supporting both NM semi-structured sparsity and low-precision integer computation is a promising solution to boost model serving efficiency. We propose a flexible compression framework NxMiFormer that performs simultaneous sparsification and quantization.
arXiv Detail & Related papers (2022-06-30T04:33:50Z)
Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features. We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.