Related papers: Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks

Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks

URL: http://arxiv.org/abs/2502.02766v1
Date: Tue, 04 Feb 2025 23:10:13 GMT
Title: Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks
Authors: Shihao Zhang, Rayan Saab,
Abstract summary: Deep neural networks have achieved state-of-the-art performance across numerous applications.<n>Low-rank approximation techniques offer a promising solution by reducing the size and complexity of these networks.<n>We develop an analytical framework for data-driven post-training low-rank compression.
Score: 5.582683296425384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have achieved state-of-the-art performance across numerous applications, but their high memory and computational demands present significant challenges, particularly in resource-constrained environments. Model compression techniques, such as low-rank approximation, offer a promising solution by reducing the size and complexity of these networks while only minimally sacrificing accuracy. In this paper, we develop an analytical framework for data-driven post-training low-rank compression. We prove three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. Our results represent a step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches and towards theoretically grounded compression algorithms that reduce inference costs while maintaining performance.

Related papers

A Benchmark Study of Neural Network Compression Methods for Hyperspectral Image Classification [0.0]
compression techniques have been proposed to reduce model size and computational cost while maintaining predictive performance.<n>We examine three widely used compression strategies for convolutional neural networks: pruning, quantization, and knowledge distillation.<n>Our results demonstrate that compressed models can significantly reduce model size and computational cost while maintaining competitive classification performance.
arXiv Detail & Related papers (2026-03-05T01:48:30Z)
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression [55.63153956934198]
Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs)<n>Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios.<n>We propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy.
arXiv Detail & Related papers (2026-02-09T06:57:15Z)
On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization [5.952537659103525]
We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection.<n>We prove convergence of iterative singular value thresholding for training neural networks subject to a soft rank constraint.
arXiv Detail & Related papers (2025-07-12T23:39:14Z)
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks [1.7068557927955383]
We introduce a low-rank training scheme enhanced with a novel spectral regularizer that controls the condition number of the low-rank core in each layer.<n>This approach mitigates the sensitivity of compressed models to adversarial perturbations without sacrificing clean accuracy.
arXiv Detail & Related papers (2025-05-12T19:46:29Z)
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning [15.78336840511033]
This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates. We employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity.
arXiv Detail & Related papers (2024-08-25T09:30:34Z)
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z)
Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning. Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z)
A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization. We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights. We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.