Related papers: Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints

Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints

URL: http://arxiv.org/abs/2208.04425v1
Date: Mon, 8 Aug 2022 21:24:20 GMT
Title: Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints
Authors: Jose Gallego-Posada and Juan Ramirez and Akram Erraqabi and Yoshua Bengio and Simon Lacoste-Julien
Abstract summary: We focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor. We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
Score: 81.46143788046892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The performance of trained neural networks is robust to harsh levels of pruning. Coupled with the ever-growing size of deep learning models, this observation has motivated extensive research on learning sparse models. In this work, we focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor, thus lacking direct control of the resulting model sparsity. In response, we adopt a constrained formulation: using the gate mechanism proposed by Louizos et al. (2018), we formulate a constrained optimization problem where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion. Experiments on CIFAR-10/100, TinyImageNet, and ImageNet using WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal and demonstrate that we can reliably achieve pre-determined sparsity targets without compromising on predictive performance.

Related papers

Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning [4.421875265386832]
Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks. Recent sparse learning methods have shown promising performance up to moderate sparsity levels such as 95% and 98%. We propose a collection of techniques that enable the continuous learning of networks without accuracy collapse even at extreme sparsities.
arXiv Detail & Related papers (2024-11-20T18:54:53Z)
Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification [0.0]
We first pre-train the model with self-supervision to enable it to learn common feature expressions on a large amount of unlabeled data. Then fine-tune it on the few-shot dataset Mini-ImageNet to improve the model's accuracy and generalization ability under limited data.
arXiv Detail & Related papers (2024-11-19T01:01:56Z)
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z)
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size. Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies. In this work, we consider the dynamic sparse training as a sparse connectivity search problem. Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community. Recent research argue that its performance will suffer a cliff fall when the model size decreases. We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z)
A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy. Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.