Related papers: Layer-Wise Data-Free CNN Compression

Layer-Wise Data-Free CNN Compression

URL: http://arxiv.org/abs/2011.09058v3
Date: Thu, 19 May 2022 21:28:08 GMT
Title: Layer-Wise Data-Free CNN Compression
Authors: Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari
Abstract summary: We show how to generate layer-wise training data using only a pretrained network. We present results for layer-wise compression using quantization and pruning.
Score: 49.73757297936685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a computationally efficient method for compressing a trained neural network without using real data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data using only a pretrained network. We use this data to perform independent layer-wise compressions on the pretrained network. We also show how to precondition the network to improve the accuracy of our layer-wise compression method. We present results for layer-wise compression using quantization and pruning. When quantizing, we compress with higher accuracy than related works while using orders of magnitude less compute. When compressing MobileNetV2 and evaluating on ImageNet, our method outperforms existing methods for quantization at all bit-widths, achieving a $+0.34\%$ improvement in $8$-bit quantization, and a stronger improvement at lower bit-widths (up to a $+28.50\%$ improvement at $5$ bits). When pruning, we outperform baselines of a similar compute envelope, achieving $1.5$ times the sparsity rate at the same accuracy. We also show how to combine our efficient method with high-compute generative methods to improve upon their results.

Related papers

Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models. We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
Accelerating Distributed Deep Learning using Lossless Homomorphic Compression [17.654138014999326]
We introduce a novel compression algorithm that effectively merges worker-level compression with in-network aggregation. We show up to a 6.33$times$ improvement in aggregation throughput and a 3.74$times$ increase in per-iteration training speed.
arXiv Detail & Related papers (2024-02-12T09:57:47Z)
Compression of Recurrent Neural Networks using Matrix Factorization [0.9208007322096533]
We propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.
arXiv Detail & Related papers (2023-10-19T12:35:30Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
Compression-aware Continual Learning using Singular Value Decomposition [2.4283778735260686]
We propose a compression based continual task learning method that can dynamically grow a neural network. Inspired by the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations. Our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning.
arXiv Detail & Related papers (2020-09-03T23:29:50Z)
Compressing Deep Neural Networks via Layer Fusion [32.80630183210368]
textitlayer fusion is a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional overhead, while maintaining competitive performance.
arXiv Detail & Related papers (2020-07-29T15:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.