Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints
- URL: http://arxiv.org/abs/2208.04425v1
- Date: Mon, 8 Aug 2022 21:24:20 GMT
- Title: Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints
- Authors: Jose Gallego-Posada and Juan Ramirez and Akram Erraqabi and Yoshua
Bengio and Simon Lacoste-Julien
- Abstract summary: We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
- Score: 81.46143788046892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of trained neural networks is robust to harsh levels of
pruning. Coupled with the ever-growing size of deep learning models, this
observation has motivated extensive research on learning sparse models. In this
work, we focus on the task of controlling the level of sparsity when performing
sparse learning. Existing methods based on sparsity-inducing penalties involve
expensive trial-and-error tuning of the penalty factor, thus lacking direct
control of the resulting model sparsity. In response, we adopt a constrained
formulation: using the gate mechanism proposed by Louizos et al. (2018), we
formulate a constrained optimization problem where sparsification is guided by
the training objective and the desired sparsity target in an end-to-end
fashion. Experiments on CIFAR-10/100, TinyImageNet, and ImageNet using
WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal
and demonstrate that we can reliably achieve pre-determined sparsity targets
without compromising on predictive performance.
Related papers
- Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Dynamic Sparse Training via Balancing the Exploration-Exploitation
Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size.
Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies.
In this work, we consider the dynamic sparse training as a sparse connectivity search problem.
Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - DisCo: Remedy Self-supervised Learning on Lightweight Models with
Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community.
Recent research argue that its performance will suffer a cliff fall when the model size decreases.
We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - A Deep Marginal-Contrastive Defense against Adversarial Attacks on 1D
Models [3.9962751777898955]
Deep learning algorithms have been recently targeted by attackers due to their vulnerability.
Non-continuous deep models are still not robust against adversarial attacks.
We propose a novel objective/loss function, which enforces the features to lie under a specified margin to facilitate their prediction.
arXiv Detail & Related papers (2020-12-08T20:51:43Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.