Learning Rich Nearest Neighbor Representations from Self-supervised
Ensembles
- URL: http://arxiv.org/abs/2110.10293v1
- Date: Tue, 19 Oct 2021 22:24:57 GMT
- Title: Learning Rich Nearest Neighbor Representations from Self-supervised
Ensembles
- Authors: Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong
- Abstract summary: We provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time.
This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting.
- Score: 60.97922557957857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretraining convolutional neural networks via self-supervision, and applying
them in transfer learning, is an incredibly fast-growing field that is rapidly
and iteratively improving performance across practically all image domains.
Meanwhile, model ensembling is one of the most universally applicable
techniques in supervised learning literature and practice, offering a simple
solution to reliably improve performance. But how to optimally combine
self-supervised models to maximize representation quality has largely remained
unaddressed. In this work, we provide a framework to perform self-supervised
model ensembling via a novel method of learning representations directly
through gradient descent at inference time. This technique improves
representation quality, as measured by k-nearest neighbors, both on the
in-domain dataset and in the transfer setting, with models transferable from
the former setting to the latter. Additionally, this direct learning of feature
through backpropagation improves representations from even a single model,
echoing the improvements found in self-distillation.
Related papers
- Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - On-the-Fly Guidance Training for Medical Image Registration [14.309599960641242]
This study introduces a novel On-the-Fly Guidance (OFG) training framework for enhancing existing learning-based image registration models.
Our method proposes a supervised fashion for training registration models, without the need for any labeled data.
Our method is tested across several benchmark datasets and leading models, it significantly enhanced performance.
arXiv Detail & Related papers (2023-08-29T11:12:53Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Iterative autoregression: a novel trick to improve your low-latency
speech enhancement model [2.2999148299770047]
Streaming models are an essential component of real-time speech enhancement tools.
We propose a straightforward yet effective alternative technique for training autoregressive low-latency speech enhancement models.
arXiv Detail & Related papers (2022-11-03T12:32:33Z) - Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations [8.336315962271396]
We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
arXiv Detail & Related papers (2021-06-15T10:49:46Z) - Learning by Distillation: A Self-Supervised Learning Framework for
Optical Flow Estimation [71.76008290101214]
DistillFlow is a knowledge distillation approach to learning optical flow.
It achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets.
Our models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark.
arXiv Detail & Related papers (2021-06-08T09:13:34Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.