Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
- URL: http://arxiv.org/abs/2302.14623v1
- Date: Tue, 28 Feb 2023 15:03:18 GMT
- Title: Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
- Authors: Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia
Ponomareva, Zhe Zhao, Rahul Mazumder
- Abstract summary: We propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint.
Our approach, CHITA, extends the classical Brain Surgeon framework and results in significant improvements in speed, memory, and performance.
- Score: 9.440450886684603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The sheer size of modern neural networks makes model serving a serious
computational challenge. A popular class of compression techniques overcomes
this challenge by pruning or sparsifying the weights of pretrained networks.
While useful, these techniques often face serious tradeoffs between
computational requirements and compression quality. In this work, we propose a
novel optimization-based pruning framework that considers the combined effect
of pruning (and updating) multiple weights subject to a sparsity constraint.
Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and
results in significant improvements in speed, memory, and performance over
existing optimization-based approaches for network pruning. CHITA's main
workhorse performs combinatorial optimization updates on a memory-friendly
representation of local quadratic approximation(s) of the loss function. On a
standard benchmark of pretrained models and datasets, CHITA leads to
significantly better sparsity-accuracy tradeoffs than competing methods. For
example, for MLPNet with only 2% of the weights retained, our approach improves
the accuracy by 63% relative to the state of the art. Furthermore, when used in
conjunction with fine-tuning SGD steps, our method achieves significant
accuracy gains over the state-of-the-art approaches.
Related papers
- Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks [10.229120811024162]
deep neural networks (DNNs) pose significant challenges to their deployment on edge devices.
Common approaches to address this issue are pruning and mixed-precision quantization.
We propose a novel methodology to apply them jointly via a lightweight gradient-based search.
arXiv Detail & Related papers (2024-07-01T08:07:02Z) - ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models [14.310720048047136]
ALPS is an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned gradient conjugate-based post-processing step.
Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency.
On the OPT-30B model with 70% sparsity, ALPS achieves a 13% reduction in test perplexity on the WikiText dataset and a 19% improvement in zero-shot benchmark performance compared to existing methods.
arXiv Detail & Related papers (2024-06-12T02:57:41Z) - FALCON: FLOP-Aware Combinatorial Optimization for Neural Network Pruning [17.60353530072587]
Network pruning offers a solution to reduce model size and computational cost while maintaining performance.
Most current pruning methods focus primarily on improving sparsity by reducing the number of nonzero parameters.
We propose FALCON, a novel-optimization-based framework for network pruning that jointly takes into account model accuracy (fidelity), FLOPs, and sparsity constraints.
arXiv Detail & Related papers (2024-03-11T18:40:47Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Optimal Brain Compression: A Framework for Accurate Post-Training
Quantization and Pruning [29.284147465251685]
We introduce a new compression framework which covers both weight pruning and quantization in a unified setting.
We show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods.
arXiv Detail & Related papers (2022-08-24T14:33:35Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Learning from Images: Proactive Caching with Parallel Convolutional
Neural Networks [94.85780721466816]
A novel framework for proactive caching is proposed in this paper.
It combines model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image.
Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost.
arXiv Detail & Related papers (2021-08-15T21:32:47Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.