Related papers: Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse Independent Subnetworks

Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse Independent Subnetworks

URL: http://arxiv.org/abs/2202.11782v1
Date: Wed, 23 Feb 2022 20:53:54 GMT
Title: Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse Independent Subnetworks
Authors: Tim Whitaker, Darrell Whitley
Abstract summary: We introduce a fast, low-cost method for creating diverse ensembles of neural networks without needing to train multiple models from scratch. We create child networks by cloning the parent and dramatically pruning the parameters of each child to create an ensemble of members with unique and diverse topologies. This diversity enables "Prune and Tune" ensembles to achieve results that are competitive with traditional ensembles at a fraction of the training cost.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensemble Learning is an effective method for improving generalization in machine learning. However, as state-of-the-art neural networks grow larger, the computational cost associated with training several independent networks becomes expensive. We introduce a fast, low-cost method for creating diverse ensembles of neural networks without needing to train multiple models from scratch. We do this by first training a single parent network. We then create child networks by cloning the parent and dramatically pruning the parameters of each child to create an ensemble of members with unique and diverse topologies. We then briefly train each child network for a small number of epochs, which now converge significantly faster when compared to training from scratch. We explore various ways to maximize diversity in the child networks, including the use of anti-random pruning and one-cycle tuning. This diversity enables "Prune and Tune" ensembles to achieve results that are competitive with traditional ensembles at a fraction of the training cost. We benchmark our approach against state of the art low-cost ensemble methods and display marked improvement in both accuracy and uncertainty estimation on CIFAR-10 and CIFAR-100.

Related papers

Noisy Deep Ensemble: Accelerating Deep Ensemble Learning via Noise Injection [0.0]
Neural network ensembles are a simple yet effective approach for enhancing generalization capabilities. We propose the novel textbfNoisy Deep Ensemble' method, significantly reducing the training time required for neural network ensembles.
arXiv Detail & Related papers (2025-04-08T04:36:39Z)
Network Fission Ensembles for Low-Cost Self-Ensembles [20.103367702014474]
We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE) We first prune some of the weights to reduce the training burden. We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
arXiv Detail & Related papers (2024-08-05T08:23:59Z)
Harnessing Increased Client Participation with Cohort-Parallel Federated Learning [2.9593087583214173]
Federated Learning (FL) is a machine learning approach where nodes collaboratively train a global model. We introduce Cohort-Parallel Federated Learning (CPFL), a novel learning approach where each cohort independently trains a global model. CPFL with four cohorts, non-IID data distribution, and CIFAR-10 yields a 1.9$times$ reduction in train time and a 1.3$times$ reduction in resource usage.
arXiv Detail & Related papers (2024-05-24T15:34:09Z)
Neural Subnetwork Ensembles [2.44755919161855]
This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles. Child networks are formed by sampling, perturbing, and optimizingworks from a trained parent model. Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance.
arXiv Detail & Related papers (2023-11-23T17:01:16Z)
Multi-head Ensemble of Smoothed Classifiers for Certified Robustness [30.143629319940427]
Randomized Smoothing (RS) is a promising technique for certified robustness. Recent in RS the ensemble of multiple Deep Neural Networks (DNNs) has shown state-of-the-art performances. We consider a novel ensemble-based training way for a single DNN with multiple augmented heads, named as SmOothed Multi-head Ensemble (SOME)
arXiv Detail & Related papers (2022-11-20T06:31:53Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity. We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting. Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z)
Efficient Diversity-Driven Ensemble for Deep Neural Networks [28.070540722925152]
We propose Efficient Diversity-Driven Ensemble (EDDE) to address both the diversity and the efficiency of an ensemble. Compared with other well-known ensemble methods, EDDE can get highest ensemble accuracy with the lowest training cost. We evaluate EDDE on Computer Vision (CV) and Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2021-12-26T04:28:47Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks [97.08677678499075]
We introduce MixMo, a new framework for learning multi-input multi-output deepworks. We show that binary mixing in features - particularly with patches from CutMix - enhances results by makingworks stronger and more diverse. In addition to being easy to implement and adding no cost at inference, our models outperform much costlier data augmented deep ensembles.
arXiv Detail & Related papers (2021-03-10T15:31:02Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning [46.768185367275564]
BatchEnsemble is an ensemble method whose computational and memory costs are significantly lower than typical ensembles. We show that BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks.
arXiv Detail & Related papers (2020-02-17T00:00:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.