Related papers: CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

URL: http://arxiv.org/abs/2210.09223v2
Date: Wed, 31 May 2023 09:59:46 GMT
Title: CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
Authors: Denis Kuznedelev, Eldar Kurtic, Elias Frantar, Dan Alistarh
Abstract summary: Correlation Aware Pruner (CAP) significantly pushes the compressibility limits for state-of-the-art architectures. New theoretically-justified pruner handles complex weight correlations accurately and efficiently during the pruning process itself. We show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.
Score: 22.055655390093722
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures. Our method is based on two technical advancements: a new theoretically-justified pruner, which can handle complex weight correlations accurately and efficiently during the pruning process itself, and an efficient finetuning procedure for post-compression recovery. We validate our approach via extensive experiments on several modern vision models such as Vision Transformers (ViT), modern CNNs, and ViT-CNN hybrids, showing for the first time that these can be pruned to high sparsity levels (e.g. $\geq 75$%) with low impact on accuracy ($\leq 1$% relative drop). Our approach is also compatible with structured pruning and quantization, and can lead to practical speedups of 1.5 to 2.4x without accuracy loss. To further showcase CAP's accuracy and scalability, we use it to show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.

Related papers

Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression [24.119415458653616]
We propose a novel unified pruning framework Comb, Prune, Distill (CPD) to address both model-agnostic and task-agnostic concerns simultaneously. Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence. In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
arXiv Detail & Related papers (2024-08-06T09:02:31Z)
Task-Agnostic Structured Pruning of Speech Representation Models [18.555223754089905]
We propose a fine-grained attention head pruning method to compensate for the performance degradation. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks.
arXiv Detail & Related papers (2023-06-02T09:11:06Z)
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer [76.2625311630021]
Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks. To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency. We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
arXiv Detail & Related papers (2023-01-13T00:40:24Z)
Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks [1.1602089225841632]
Modern deep neural networks are generally poorly calibrated due to the overestimation of predictive confidence. We propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing.
arXiv Detail & Related papers (2022-12-27T21:21:58Z)
CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way. CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning. CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models [23.12519490211362]
This paper studies the accuracy-compression trade-off for unstructured weight pruning in the context of BERT models. We introduce Optimal BERT Surgeon (O-BERT-S), an efficient and accurate weight pruning method based on approximate second-order information. We investigate the impact of this pruning method when compounding compression approaches for Transformer-based models.
arXiv Detail & Related papers (2022-03-14T16:40:31Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC) We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z)
DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure. Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.