MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
- URL: http://arxiv.org/abs/2103.06132v1
- Date: Wed, 10 Mar 2021 15:31:02 GMT
- Title: MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
- Authors: Alexandre Rame, Remy Sun, Matthieu Cord
- Abstract summary: We introduce MixMo, a new framework for learning multi-input multi-output deepworks.
We show that binary mixing in features - particularly with patches from CutMix - enhances results by makingworks stronger and more diverse.
In addition to being easy to implement and adding no cost at inference, our models outperform much costlier data augmented deep ensembles.
- Score: 97.08677678499075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent strategies achieved ensembling for free by fitting concurrently
diverse subnetworks inside a single base network. The main idea during training
is that each subnetwork learns to classify only one of the multiple inputs
simultaneously provided. However, the question of how these multiple inputs
should be mixed has not been studied yet. In this paper, we introduce MixMo, a
new generalized framework for learning multi-input multi-output deep
subnetworks. Our key motivation is to replace the suboptimal summing operation
hidden in previous approaches by a more appropriate mixing mechanism. For that
purpose, we draw inspiration from successful mixed sample data augmentations.
We show that binary mixing in features - particularly with patches from CutMix
- enhances results by making subnetworks stronger and more diverse. We improve
state of the art on the CIFAR-100 and Tiny-ImageNet classification datasets. In
addition to being easy to implement and adding no cost at inference, our models
outperform much costlier data augmented deep ensembles. We open a new line of
research complementary to previous works, as we operate in features and better
leverage the expressiveness of large networks.
Related papers
- Network Fission Ensembles for Low-Cost Self-Ensembles [20.103367702014474]
We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE)
We first prune some of the weights to reduce the training burden.
We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
arXiv Detail & Related papers (2024-08-05T08:23:59Z) - The Benefits of Mixup for Feature Learning [117.93273337740442]
We first show that Mixup using different linear parameters for features and labels can still achieve similar performance to standard Mixup.
We consider a feature-noise data model and show that Mixup training can effectively learn the rare features from its mixture with the common features.
In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad performance.
arXiv Detail & Related papers (2023-03-15T08:11:47Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning [53.57075147367114]
We introduce OpenMixup, the first mixup augmentation and benchmark for visual representation learning.
We train 18 representative mixup baselines from scratch and rigorously evaluate them across 11 image datasets.
We also open-source our modular backbones, including a collection of popular vision backbones, optimization strategies, and analysis toolkits.
arXiv Detail & Related papers (2022-09-11T12:46:01Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - Rethinking Coarse-to-Fine Approach in Single Image Deblurring [19.195704769925925]
We present a fast and accurate deblurring network design using a multi-input multi-output U-net.
The proposed network outperforms the state-of-the-art methods in terms of both accuracy and computational complexity.
arXiv Detail & Related papers (2021-08-11T06:37:01Z) - Mixup Without Hesitation [38.801366276601414]
We propose mixup Without hesitation (mWh), a concise, effective, and easy-to-use training algorithm.
mWh strikes a good balance between exploration and exploitation by gradually replacing mixup with basic data augmentation.
Our code is open-source and available at https://github.com/yuhao318318/mWh.
arXiv Detail & Related papers (2021-01-12T08:11:08Z) - Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup [19.680580983094323]
Puzzle Mix is a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples.
Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results.
arXiv Detail & Related papers (2020-09-15T10:10:23Z) - Digit Image Recognition Using an Ensemble of One-Versus-All Deep Network
Classifiers [2.385916960125935]
We implement a novel technique for the case of digit image recognition and test and evaluate it on the same.
Every network in an ensemble has been trained by an OVA training technique using the Gradient Descent with Momentum (SGDMA)
Our proposed technique outperforms the baseline on digit image recognition for all datasets.
arXiv Detail & Related papers (2020-06-28T15:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.