Related papers: Network Fission Ensembles for Low-Cost Self-Ensembles

Network Fission Ensembles for Low-Cost Self-Ensembles

URL: http://arxiv.org/abs/2408.02301v1
Date: Mon, 5 Aug 2024 08:23:59 GMT
Title: Network Fission Ensembles for Low-Cost Self-Ensembles
Authors: Hojung Lee, Jong-Seok Lee,
Abstract summary: We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE) We first prune some of the weights to reduce the training burden. We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
Score: 20.103367702014474
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent ensemble learning methods for image classification have been shown to improve classification accuracy with low extra cost. However, they still require multiple trained models for ensemble inference, which eventually becomes a significant burden when the model size increases. In this paper, we propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE), by converting a conventional network itself into a multi-exit structure. Starting from a given initial network, we first prune some of the weights to reduce the training burden. We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits. We call this process Network Fission. Through this, multiple outputs can be obtained from a single network, which enables ensemble learning. Since this process simply changes the existing network structure to multi-exits without using additional networks, there is no extra computational burden for ensemble learning and inference. Moreover, by learning from multiple losses of all exits, the multi-exits improve performance via regularization, and high performance can be achieved even with increased network sparsity. With our simple yet effective method, we achieve significant improvement compared to existing ensemble methods. The code is available at https://github.com/hjdw2/NFE.

Related papers

Neural Subnetwork Ensembles [2.44755919161855]
This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles. Child networks are formed by sampling, perturbing, and optimizingworks from a trained parent model. Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance.
arXiv Detail & Related papers (2023-11-23T17:01:16Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Prune and Tune Ensembles: Low-Cost Ensemble Learning With Sparse Independent Subnetworks [0.0]
We introduce a fast, low-cost method for creating diverse ensembles of neural networks without needing to train multiple models from scratch. We create child networks by cloning the parent and dramatically pruning the parameters of each child to create an ensemble of members with unique and diverse topologies. This diversity enables "Prune and Tune" ensembles to achieve results that are competitive with traditional ensembles at a fraction of the training cost.
arXiv Detail & Related papers (2022-02-23T20:53:54Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations [51.85020143716815]
We propose MutualNet to train a single network that can run at a diverse set of resource constraints. Our method trains a cohort of model configurations with various network widths and input resolutions. MutualNet is a general training methodology that can be applied to various network structures.
arXiv Detail & Related papers (2021-05-14T22:30:13Z)
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks [97.08677678499075]
We introduce MixMo, a new framework for learning multi-input multi-output deepworks. We show that binary mixing in features - particularly with patches from CutMix - enhances results by makingworks stronger and more diverse. In addition to being easy to implement and adding no cost at inference, our models outperform much costlier data augmented deep ensembles.
arXiv Detail & Related papers (2021-03-10T15:31:02Z)
Learning to Branch for Multi-Task Learning [12.49373126819798]
We present an automated multi-task learning algorithm that learns where to share or branch within a network. We propose a novel tree-structured design space that casts a tree branching operation as a gumbel-softmax sampling procedure.
arXiv Detail & Related papers (2020-06-02T19:23:21Z)
Semantic Drift Compensation for Class-Incremental Learning [48.749630494026086]
Class-incremental learning of deep networks sequentially increases the number of classes to be classified. We propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars.
arXiv Detail & Related papers (2020-04-01T13:31:19Z)
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network. In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.