Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse
Independent Subnetworks
- URL: http://arxiv.org/abs/2202.11782v1
- Date: Wed, 23 Feb 2022 20:53:54 GMT
- Title: Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse
Independent Subnetworks
- Authors: Tim Whitaker, Darrell Whitley
- Abstract summary: We introduce a fast, low-cost method for creating diverse ensembles of neural networks without needing to train multiple models from scratch.
We create child networks by cloning the parent and dramatically pruning the parameters of each child to create an ensemble of members with unique and diverse topologies.
This diversity enables "Prune and Tune" ensembles to achieve results that are competitive with traditional ensembles at a fraction of the training cost.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensemble Learning is an effective method for improving generalization in
machine learning. However, as state-of-the-art neural networks grow larger, the
computational cost associated with training several independent networks
becomes expensive. We introduce a fast, low-cost method for creating diverse
ensembles of neural networks without needing to train multiple models from
scratch. We do this by first training a single parent network. We then create
child networks by cloning the parent and dramatically pruning the parameters of
each child to create an ensemble of members with unique and diverse topologies.
We then briefly train each child network for a small number of epochs, which
now converge significantly faster when compared to training from scratch. We
explore various ways to maximize diversity in the child networks, including the
use of anti-random pruning and one-cycle tuning. This diversity enables "Prune
and Tune" ensembles to achieve results that are competitive with traditional
ensembles at a fraction of the training cost. We benchmark our approach against
state of the art low-cost ensemble methods and display marked improvement in
both accuracy and uncertainty estimation on CIFAR-10 and CIFAR-100.
Related papers
- Network Fission Ensembles for Low-Cost Self-Ensembles [20.103367702014474]
We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE)
We first prune some of the weights to reduce the training burden.
We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
arXiv Detail & Related papers (2024-08-05T08:23:59Z) - Harnessing Increased Client Participation with Cohort-Parallel Federated Learning [2.9593087583214173]
Federated Learning (FL) is a machine learning approach where nodes collaboratively train a global model.
We introduce Cohort-Parallel Federated Learning (CPFL), a novel learning approach where each cohort independently trains a global model.
CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9$times$ reduction in train time and a 1.3$times$ reduction in resource usage.
arXiv Detail & Related papers (2024-05-24T15:34:09Z) - Neural Subnetwork Ensembles [2.44755919161855]
This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles.
Child networks are formed by sampling, perturbing, and optimizingworks from a trained parent model.
Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance.
arXiv Detail & Related papers (2023-11-23T17:01:16Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Efficient Diversity-Driven Ensemble for Deep Neural Networks [28.070540722925152]
We propose Efficient Diversity-Driven Ensemble (EDDE) to address both the diversity and the efficiency of an ensemble.
Compared with other well-known ensemble methods, EDDE can get highest ensemble accuracy with the lowest training cost.
We evaluate EDDE on Computer Vision (CV) and Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2021-12-26T04:28:47Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks [97.08677678499075]
We introduce MixMo, a new framework for learning multi-input multi-output deepworks.
We show that binary mixing in features - particularly with patches from CutMix - enhances results by makingworks stronger and more diverse.
In addition to being easy to implement and adding no cost at inference, our models outperform much costlier data augmented deep ensembles.
arXiv Detail & Related papers (2021-03-10T15:31:02Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - BatchEnsemble: An Alternative Approach to Efficient Ensemble and
Lifelong Learning [46.768185367275564]
BatchEnsemble is an ensemble method whose computational and memory costs are significantly lower than typical ensembles.
We show that BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles.
We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks.
arXiv Detail & Related papers (2020-02-17T00:00:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.