Ex uno plures: Splitting One Model into an Ensemble of Subnetworks
- URL: http://arxiv.org/abs/2106.04767v1
- Date: Wed, 9 Jun 2021 01:49:49 GMT
- Title: Ex uno plures: Splitting One Model into an Ensemble of Subnetworks
- Authors: Zhilu Zhang, Vianne R. Gao, Mert R. Sabuncu
- Abstract summary: We propose a strategy to compute an ensemble ofworks, each corresponding to a non-overlapping dropout mask computed via a pruning strategy and trained independently.
We show that the proposed subnetwork ensembling method can perform as well as standard deep ensembles in both accuracy and uncertainty estimates.
We experimentally demonstrate that subnetwork ensembling also consistently outperforms recently proposed approaches that efficiently ensemble neural networks.
- Score: 18.814965334083425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monte Carlo (MC) dropout is a simple and efficient ensembling method that can
improve the accuracy and confidence calibration of high-capacity deep neural
network models. However, MC dropout is not as effective as more
compute-intensive methods such as deep ensembles. This performance gap can be
attributed to the relatively poor quality of individual models in the MC
dropout ensemble and their lack of diversity. These issues can in turn be
traced back to the coupled training and substantial parameter sharing of the
dropout models. Motivated by this perspective, we propose a strategy to compute
an ensemble of subnetworks, each corresponding to a non-overlapping dropout
mask computed via a pruning strategy and trained independently. We show that
the proposed subnetwork ensembling method can perform as well as standard deep
ensembles in both accuracy and uncertainty estimates, yet with a computational
efficiency similar to MC dropout. Lastly, using several computer vision
datasets like CIFAR10/100, CUB200, and Tiny-Imagenet, we experimentally
demonstrate that subnetwork ensembling also consistently outperforms recently
proposed approaches that efficiently ensemble neural networks.
Related papers
- Self-Supervised Learning in Deep Networks: A Pathway to Robust Few-Shot Classification [0.0]
We first pre-train the model with self-supervision to enable it to learn common feature expressions on a large amount of unlabeled data.
Then fine-tune it on the few-shot dataset Mini-ImageNet to improve the model's accuracy and generalization ability under limited data.
arXiv Detail & Related papers (2024-11-19T01:01:56Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Pushing Boundaries: Mixup's Influence on Neural Collapse [3.6919724596215615]
Mixup is a data augmentation strategy that employs convex combinations of training instances and their respective labels to augment the robustness and calibration of deep neural networks.
This study investigates the last-layer activations of training data for deep networks subjected to mixup.
We show that mixup's last-layer activations predominantly converge to a distinctive configuration different than one might expect.
arXiv Detail & Related papers (2024-02-09T04:01:25Z) - Soft Merging: A Flexible and Robust Soft Model Merging Approach for
Enhanced Neural Network Performance [6.599368083393398]
Gradient (SGD) is often limited to converging local optima to improve model performance.
em soft merging method minimizes the obtained local optima models in undesirable results.
Experiments underscore the effectiveness of the merged networks.
arXiv Detail & Related papers (2023-09-21T17:07:31Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Robust low-rank training via approximate orthonormal constraints [2.519906683279153]
We introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold.
The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy.
arXiv Detail & Related papers (2023-06-02T12:22:35Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Ensembles of Spiking Neural Networks [0.3007949058551534]
This paper demonstrates how to construct ensembles of spiking neural networks producing state-of-the-art results.
We achieve classification accuracies of 98.71%, 100.0%, and 99.09%, on the MNIST, NMNIST and DVS Gesture datasets respectively.
We formalize spiking neural networks as GLM predictors, identifying a suitable representation for their target domain.
arXiv Detail & Related papers (2020-10-15T17:45:18Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Deep Unfolding Network for Image Super-Resolution [159.50726840791697]
This paper proposes an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods.
The proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model.
arXiv Detail & Related papers (2020-03-23T17:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.