Weighted Ensemble Self-Supervised Learning
- URL: http://arxiv.org/abs/2211.09981v3
- Date: Sun, 9 Apr 2023 19:15:39 GMT
- Title: Weighted Ensemble Self-Supervised Learning
- Authors: Yangjun Ruan, Saurabh Singh, Warren Morningstar, Alexander A. Alemi,
Sergey Ioffe, Ian Fischer, Joshua V. Dillon
- Abstract summary: Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
- Score: 67.24482854208783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensembling has proven to be a powerful technique for boosting model
performance, uncertainty estimation, and robustness in supervised learning.
Advances in self-supervised learning (SSL) enable leveraging large unlabeled
corpora for state-of-the-art few-shot and supervised learning performance. In
this paper, we explore how ensemble methods can improve recent SSL techniques
by developing a framework that permits data-dependent weighted cross-entropy
losses. We refrain from ensembling the representation backbone; this choice
yields an efficient ensemble method that incurs a small training cost and
requires no architectural changes or computational overhead to downstream
evaluation. The effectiveness of our method is demonstrated with two
state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al.,
2022). Our method outperforms both in multiple evaluation metrics on
ImageNet-1K, particularly in the few-shot setting. We explore several weighting
schemes and find that those which increase the diversity of ensemble heads lead
to better downstream evaluation results. Thorough experiments yield improved
prior art baselines which our method still surpasses; e.g., our overall
improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.
Related papers
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models [57.582219834039506]
We introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts.
It is based on the pre-existing dense checkpoints of our Skywork-13B model.
arXiv Detail & Related papers (2024-06-03T03:58:41Z) - On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning [18.318758111829386]
We propose an efficient single-branch SSL method based on non-parametric instance discrimination.
We also propose a novel self-distillation loss that minimizes the KL divergence between the probability distribution and its square root version.
arXiv Detail & Related papers (2024-04-30T06:39:04Z) - Augmentations vs Algorithms: What Works in Self-Supervised Learning [9.194402355758164]
We study the relative effects of data augmentations, pretraining algorithms, and model architectures in Self-Supervised Learning (SSL)
We propose a new framework which unifies many seemingly disparate SSL methods into a single shared template.
arXiv Detail & Related papers (2024-03-08T23:42:06Z) - From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning [32.18543787821028]
This paper proposes an adaptive technique of batch fusion for self-supervised contrastive learning.
It achieves state-of-the-art performance under equitable comparisons.
We suggest that the proposed method may contribute to the advancement of data-driven self-supervised learning research.
arXiv Detail & Related papers (2023-11-16T15:47:49Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural
Language Understanding [89.92513889132825]
We introduce an evaluation framework that improves previous evaluation procedures in three key aspects, i.e., test performance, dev-test correlation, and stability.
We open-source our toolkit, FewNLU, that implements our evaluation framework along with a number of state-of-the-art methods.
arXiv Detail & Related papers (2021-09-27T00:57:30Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - ReSSL: Relational Self-Supervised Learning with Weak Augmentation [68.47096022526927]
Self-supervised learning has achieved great success in learning visual representations without data annotations.
We introduce a novel relational SSL paradigm that learns representations by modeling the relationship between different instances.
Our proposed ReSSL significantly outperforms the previous state-of-the-art algorithms in terms of both performance and training efficiency.
arXiv Detail & Related papers (2021-07-20T06:53:07Z) - Generalized Reinforcement Meta Learning for Few-Shot Optimization [3.7675996866306845]
We present a generic and flexible Reinforcement Learning (RL) based meta-learning framework for the problem of few-shot learning.
Our framework could be easily extended to do network architecture search.
arXiv Detail & Related papers (2020-05-04T03:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.