Related papers: Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

URL: http://arxiv.org/abs/2002.08127v2
Date: Fri, 14 May 2021 05:50:42 GMT
Title: Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle
Authors: Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, and Ming-Hsuan Yang
Abstract summary: We propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights. We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
Score: 117.95823660228537
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in convolutional neural networks(CNNs) usually come with the expense of excessive computational overhead and memory footprint. Network compression aims to alleviate this issue by training compact models with comparable performance. However, existing compression techniques either entail dedicated expert design or compromise with a moderate performance drop. In this paper, we propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights, thereby facilitating the implementation of the compressed model with the highly-optimized group convolution. We further address the problem of inter-group communication with a learnable channel shuffle mechanism. The proposed approach can be easily applied to compress many network architectures with a negligible performance drop. Extensive experimental results and analysis demonstrate that our approach gives a competitive performance against the recent network compression counterparts with a sound accuracy-complexity trade-off.

Related papers

Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z)
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks [1.7068557927955383]
We introduce a low-rank training scheme enhanced with a novel spectral regularizer that controls the condition number of the low-rank core in each layer.<n>This approach mitigates the sensitivity of compressed models to adversarial perturbations without sacrificing clean accuracy.
arXiv Detail & Related papers (2025-05-12T19:46:29Z)
Theoretical Guarantees for Low-Rank Compression of Deep Neural Networks [5.582683296425384]
Deep neural networks have achieved state-of-the-art performance across numerous applications. Low-rank approximation techniques offer a promising solution by reducing the size and complexity of these networks. We develop an analytical framework for data-driven post-training low-rank compression.
arXiv Detail & Related papers (2025-02-04T23:10:13Z)
Accelerating Distributed Deep Learning using Lossless Homomorphic Compression [17.654138014999326]
We introduce a novel compression algorithm that effectively merges worker-level compression with in-network aggregation. We show up to a 6.33$times$ improvement in aggregation throughput and a 3.74$times$ increase in per-iteration training speed.
arXiv Detail & Related papers (2024-02-12T09:57:47Z)
Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees [53.950234267704]
We introduce Global-QSGD, an All-reduce gradient-compatible quantization method.<n>We show that it accelerates distributed training by up to 3.51% over baseline quantization methods.
arXiv Detail & Related papers (2023-05-29T21:32:15Z)
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z)
Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning. Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z)
Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization. We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.