Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations
- URL: http://arxiv.org/abs/2106.08038v1
- Date: Tue, 15 Jun 2021 10:49:46 GMT
- Title: Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations
- Authors: Arsenii Ashukha, Andrei Atanov, Dmitry Vetrov
- Abstract summary: We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
- Score: 8.336315962271396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Averaging predictions over a set of models -- an ensemble -- is widely used
to improve predictive performance and uncertainty estimation of deep learning
models. At the same time, many machine learning systems, such as search,
matching, and recommendation systems, heavily rely on embeddings.
Unfortunately, due to misalignment of features of independently trained models,
embeddings, cannot be improved with a naive deep ensemble like approach. In
this work, we look at the ensembling of representations and propose mean
embeddings with test-time augmentation (MeTTA) simple yet well-performing
recipe for ensembling representations. Empirically we demonstrate that MeTTA
significantly boosts the quality of linear evaluation on ImageNet for both
supervised and self-supervised models. Even more exciting, we draw connections
between MeTTA, image retrieval, and transformation invariant models. We believe
that spreading the success of ensembles to inference higher-quality
representations is the important step that will open many new applications of
ensembling.
Related papers
- A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Multi-View Conformal Learning for Heterogeneous Sensor Fusion [0.12086712057375555]
We build and test multi-view and single-view conformal models for heterogeneous sensor fusion.
Our models provide theoretical marginal confidence guarantees since they are based on the conformal prediction framework.
Our results also showed that multi-view models generate prediction sets with less uncertainty compared to single-view models.
arXiv Detail & Related papers (2024-02-19T17:30:09Z) - Instance-Conditioned GAN Data Augmentation for Representation Learning [29.36473147430433]
We introduce DA_IC-GAN, a learnable data augmentation module that can be used off-the-shelf in conjunction with most state-of-the-art training recipes.
We show that DA_IC-GAN can boost accuracy to between 1%p and 2%p with the highest capacity models.
We additionally couple DA_IC-GAN with a self-supervised training recipe and show that we can also achieve an improvement of 1%p in accuracy in some settings.
arXiv Detail & Related papers (2023-03-16T22:45:43Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - Revisiting Weakly Supervised Pre-Training of Visual Perception Models [27.95816470075203]
Large-scale weakly supervised pre-training can outperform fully supervised approaches.
This paper revisits weakly-supervised pre-training of models using hashtag supervision.
Our results provide a compelling argument for the use of weakly supervised learning in the development of visual recognition systems.
arXiv Detail & Related papers (2022-01-20T18:55:06Z) - Learning Rich Nearest Neighbor Representations from Self-supervised
Ensembles [60.97922557957857]
We provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time.
This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting.
arXiv Detail & Related papers (2021-10-19T22:24:57Z) - Rethinking Self-Supervision Objectives for Generalizable Coherence
Modeling [8.329870357145927]
Coherence evaluation of machine generated text is one of the principal applications of coherence models that needs to be investigated.
We explore training data and self-supervision objectives that result in a model that generalizes well across tasks.
We show empirically that increasing the density of negative samples improves the basic model, and using a global negative queue further improves and stabilizes the model while training with hard negative samples.
arXiv Detail & Related papers (2021-10-14T07:44:14Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Self-supervised Pre-training with Hard Examples Improves Visual
Representations [110.23337264762512]
Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.
We first present a modeling framework that unifies existing SSP methods as learning to predict pseudo-labels.
Then, we propose new data augmentation methods of generating training examples whose pseudo-labels are harder to predict than those generated via random image transformations.
arXiv Detail & Related papers (2020-12-25T02:44:22Z) - Improving Unsupervised Image Clustering With Robust Learning [21.164537402069712]
Unsupervised image clustering methods often introduce alternative objectives to indirectly train the model and are subject to faulty predictions and overconfident results.
This research proposes an innovative model RUC that is inspired by robust learning.
Extensive experiments show that the proposed model can adjust the model confidence with better calibration and gain additional robustness against adversarial noise.
arXiv Detail & Related papers (2020-12-21T07:02:11Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.