Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints
- URL: http://arxiv.org/abs/2208.04425v1
- Date: Mon, 8 Aug 2022 21:24:20 GMT
- Title: Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints
- Authors: Jose Gallego-Posada and Juan Ramirez and Akram Erraqabi and Yoshua
Bengio and Simon Lacoste-Julien
- Abstract summary: We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
- Score: 81.46143788046892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of trained neural networks is robust to harsh levels of
pruning. Coupled with the ever-growing size of deep learning models, this
observation has motivated extensive research on learning sparse models. In this
work, we focus on the task of controlling the level of sparsity when performing
sparse learning. Existing methods based on sparsity-inducing penalties involve
expensive trial-and-error tuning of the penalty factor, thus lacking direct
control of the resulting model sparsity. In response, we adopt a constrained
formulation: using the gate mechanism proposed by Louizos et al. (2018), we
formulate a constrained optimization problem where sparsification is guided by
the training objective and the desired sparsity target in an end-to-end
fashion. Experiments on CIFAR-10/100, TinyImageNet, and ImageNet using
WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal
and demonstrate that we can reliably achieve pre-determined sparsity targets
without compromising on predictive performance.
Related papers
- Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning [4.421875265386832]
Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks.
Recent sparse learning methods have shown promising performance up to moderate sparsity levels such as 95% and 98%.
We propose a collection of techniques that enable the continuous learning of networks without accuracy collapse even at extreme sparsities.
arXiv Detail & Related papers (2024-11-20T18:54:53Z) - Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification [0.0]
We first pre-train the model with self-supervision to enable it to learn common feature expressions on a large amount of unlabeled data.
Then fine-tune it on the few-shot dataset Mini-ImageNet to improve the model's accuracy and generalization ability under limited data.
arXiv Detail & Related papers (2024-11-19T01:01:56Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Dynamic Sparse Training via Balancing the Exploration-Exploitation
Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size.
Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies.
In this work, we consider the dynamic sparse training as a sparse connectivity search problem.
Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - DisCo: Remedy Self-supervised Learning on Lightweight Models with
Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community.
Recent research argue that its performance will suffer a cliff fall when the model size decreases.
We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.