BatchEnsemble: An Alternative Approach to Efficient Ensemble and
Lifelong Learning
- URL: http://arxiv.org/abs/2002.06715v2
- Date: Thu, 20 Feb 2020 03:38:44 GMT
- Title: BatchEnsemble: An Alternative Approach to Efficient Ensemble and
Lifelong Learning
- Authors: Yeming Wen, Dustin Tran, Jimmy Ba
- Abstract summary: BatchEnsemble is an ensemble method whose computational and memory costs are significantly lower than typical ensembles.
We show that BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles.
We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks.
- Score: 46.768185367275564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensembles, where multiple neural networks are trained individually and their
predictions are averaged, have been shown to be widely successful for improving
both the accuracy and predictive uncertainty of single neural networks.
However, an ensemble's cost for both training and testing increases linearly
with the number of networks, which quickly becomes untenable.
In this paper, we propose BatchEnsemble, an ensemble method whose
computational and memory costs are significantly lower than typical ensembles.
BatchEnsemble achieves this by defining each weight matrix to be the Hadamard
product of a shared weight among all ensemble members and a rank-one matrix per
member. Unlike ensembles, BatchEnsemble is not only parallelizable across
devices, where one device trains one member, but also parallelizable within a
device, where multiple ensemble members are updated simultaneously for a given
mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and
out-of-distribution tasks, BatchEnsemble yields competitive accuracy and
uncertainties as typical ensembles; the speedup at test time is 3X and memory
reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to
lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable
performance to progressive neural networks while having a much lower
computational and memory costs. We further show that BatchEnsemble can easily
scale up to lifelong learning on Split-ImageNet which involves 100 sequential
learning tasks.
Related papers
- Network Fission Ensembles for Low-Cost Self-Ensembles [20.103367702014474]
We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE)
We first prune some of the weights to reduce the training burden.
We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
arXiv Detail & Related papers (2024-08-05T08:23:59Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - SAE: Single Architecture Ensemble Neural Networks [7.011763596804071]
Ensembles of separate neural networks (NNs) have shown superior accuracy and confidence calibration over single NN across tasks.
Recent methods create ensembles within a single network via adding early exits or considering multi input multi output approaches.
Our novel Single Architecture Ensemble framework enables an automatic and joint search through the early exit and multi input multi output configurations.
arXiv Detail & Related papers (2024-02-09T17:55:01Z) - On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet).
Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones.
We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z) - Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse
Independent Subnetworks [0.0]
We introduce a fast, low-cost method for creating diverse ensembles of neural networks without needing to train multiple models from scratch.
We create child networks by cloning the parent and dramatically pruning the parameters of each child to create an ensemble of members with unique and diverse topologies.
This diversity enables "Prune and Tune" ensembles to achieve results that are competitive with traditional ensembles at a fraction of the training cost.
arXiv Detail & Related papers (2022-02-23T20:53:54Z) - SAE: Sequential Anchored Ensembles [7.888755225607877]
We present Sequential Anchored Ensembles (SAE), a lightweight alternative to anchored ensembles.
Instead of training each member of the ensemble from scratch, the members are trained sequentially on losses sampled with high auto-correlation.
SAE outperform anchored ensembles, for a given computational budget, on some benchmarks while showing comparable performance on the others.
arXiv Detail & Related papers (2021-12-30T12:47:27Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Stochastic Weight Averaging in Parallel: Large-Batch Training that
Generalizes Well [7.262048441360133]
We propose Weight Averaging in Parallel (SWAP) to accelerate DNN training.
Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel.
The resulting models generalize equally well as those trained with small mini-batches but are produced in a substantially shorter time.
arXiv Detail & Related papers (2020-01-07T23:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.