Related papers: CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition

CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition

URL: http://arxiv.org/abs/2511.11716v1
Date: Wed, 12 Nov 2025 18:25:46 GMT
Title: CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition
Authors: Sudhakar Sah, Nikhil Chabbra, Matthieu Durnerin,
Abstract summary: We introduce CompressNAS, a framework that treats rank selection as a global search problem.<n>In ImageNet, CompressNAS compresses ResNet-18 by 8x with less than 4% accuracy drop; on COCO, we achieve 2x compression of YOLOv5s without any accuracy drop.<n>We present a new family of compressed models, STResNet, with competitive performance compared to other efficient models.
Score: 1.9556774372563988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8x with less than 4% accuracy drop; on COCO, we achieve 2x compression of YOLOv5s without any accuracy drop and 2x compression of YOLOv5n with a 2.5% drop. Finally, we present a new family of compressed models, STResNet, with competitive performance compared to other efficient models.

Related papers

Arbitrary Ratio Feature Compression via Next Token Prediction [52.10426317889982]
Arbitrary Ratio Feature Compression (ARFC) framework supports any compression ratio with a single model.<n>ARC is an auto-regressive model that performs compression via next-gressive prediction.<n>MoS module refines the compressed tokens by utilizing multiple compression results.<n>ERGC is integrated into the training process to preserve semantic and structural relationships during compression.
arXiv Detail & Related papers (2026-02-12T02:38:57Z)
TinySense: Effective CSI Compression for Scalable and Accurate Wi-Fi Sensing [10.777079283826003]
This paper introduces TinySense, an efficient compression framework that enhances the scalability of Wi-Fi-based human sensing.<n>Our approach is based on a new vector quantization-based generative adversarial network (VQGAN)
arXiv Detail & Related papers (2026-01-22T10:44:40Z)
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z)
CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way. CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning. CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z)
Optimal Rate Adaption in Federated Learning with Compressed Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates. tradeoff between compression and model accuracy in the networked environment remains unclear. We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
DKM: Differentiable K-Means Clustering Layer for Neural Network Compression [20.73169804006698]
We propose a differentiable k-means clustering layer (DKM) to train-time weight clustering-based model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. We show that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks.
arXiv Detail & Related papers (2021-08-28T14:35:41Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization. We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z)
Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization [51.26579110596767]
We propose a novel Barrier Penalty based NAS (BP-NAS) for mixed precision quantization. BP-NAS sets new state of the arts on both classification (Cifar-10, ImageNet) and detection (COCO)
arXiv Detail & Related papers (2020-07-20T12:00:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.