APP: Anytime Progressive Pruning
- URL: http://arxiv.org/abs/2204.01640v1
- Date: Mon, 4 Apr 2022 16:38:55 GMT
- Title: APP: Anytime Progressive Pruning
- Authors: Diganta Misra, Bharat Runwal, Tianlong Chen, Zhangyang Wang, Irina
Rish
- Abstract summary: We propose a novel way of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA)
The proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training.
- Score: 104.36308667437397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the latest advances in deep learning, there has been a lot of focus on
the online learning paradigm due to its relevance in practical settings.
Although many methods have been investigated for optimal learning settings in
scenarios where the data stream is continuous over time, sparse networks
training in such settings have often been overlooked. In this paper, we explore
the problem of training a neural network with a target sparsity in a particular
case of online learning: the anytime learning at macroscale paradigm (ALMA). We
propose a novel way of progressive pruning, referred to as \textit{Anytime
Progressive Pruning} (APP); the proposed approach significantly outperforms the
baseline dense and Anytime OSP models across multiple architectures and
datasets under short, moderate, and long-sequence training. Our method, for
example, shows an improvement in accuracy of $\approx 7\%$ and a reduction in
the generalization gap by $\approx 22\%$, while being $\approx 1/3$ rd the size
of the dense baseline model in few-shot restricted imagenet training. We
further observe interesting nonmonotonic transitions in the generalization gap
in the high number of megabatches-based ALMA. The code and experiment
dashboards can be accessed at
\url{https://github.com/landskape-ai/Progressive-Pruning} and
\url{https://wandb.ai/landskape/APP}, respectively.
Related papers
- COSCO: A Sharpness-Aware Training Framework for Few-shot Multivariate Time Series Classification [19.593625378366472]
We propose a new learning framework named COSCO consisting of a sharpness-aware minimization (SAM) optimization and a Prototypical loss function.
Our experiments demonstrate our proposed method outperforms the existing baseline methods.
arXiv Detail & Related papers (2024-09-15T07:41:55Z) - iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning [22.14627083675405]
We propose incremental neural mesh models that can be extended with new meshes over time.
We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets.
Our work also presents the first incremental learning approach for pose estimation.
arXiv Detail & Related papers (2024-07-12T13:57:49Z) - Automated Sizing and Training of Efficient Deep Autoencoders using
Second Order Algorithms [0.46040036610482665]
We propose a multi-step training method for generalized linear classifiers.
validation error is minimized by pruning of unnecessary inputs.
desired outputs are improved via a method similar to the Ho-Kashyap rule.
arXiv Detail & Related papers (2023-08-11T16:48:31Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot
Learning [2.4366811507669115]
GPr-Net is a lightweight and computationally efficient geometric network that captures the prototypical topology of point clouds.
We show that GPr-Net outperforms state-of-the-art methods in few-shot learning on point clouds.
arXiv Detail & Related papers (2023-04-12T17:32:18Z) - Improving Representational Continuity via Continued Pretraining [76.29171039601948]
Transfer learning community (LP-FT) outperforms naive training and other continual learning methods.
LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW)
variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark.
arXiv Detail & Related papers (2023-02-26T10:39:38Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Class Incremental Online Streaming Learning [40.97848249237289]
We propose a novel approach for the class-incremental learning in an emphonline streaming setting to address these challenges.
The proposed approach leverages implicit and explicit dual weight regularization and experience replay.
Also, we propose an efficient online memory replay and replacement buffer strategy that significantly boosts the model's performance.
arXiv Detail & Related papers (2021-10-20T19:24:31Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.