Q-Match: Self-supervised Learning by Matching Distributions Induced by a
Queue
- URL: http://arxiv.org/abs/2302.05444v1
- Date: Fri, 10 Feb 2023 18:59:05 GMT
- Title: Q-Match: Self-supervised Learning by Matching Distributions Induced by a
Queue
- Authors: Thomas Mulc and Debidatta Dwibedi
- Abstract summary: We introduce our algorithm, Q-Match, and show it is possible to induce the student-teacher distributions without any knowledge of downstream classes.
We show that our method is sample efficient--in terms of both the labels required for downstream training and the amount of unlabeled data required for pre-training--and scales well to the sizes of both the labeled and unlabeled data.
- Score: 6.1678491628787455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In semi-supervised learning, student-teacher distribution matching has been
successful in improving performance of models using unlabeled data in
conjunction with few labeled samples. In this paper, we aim to replicate that
success in the self-supervised setup where we do not have access to any labeled
data during pre-training. We introduce our algorithm, Q-Match, and show it is
possible to induce the student-teacher distributions without any knowledge of
downstream classes by using a queue of embeddings of samples from the unlabeled
dataset. We focus our study on tabular datasets and show that Q-Match
outperforms previous self-supervised learning techniques when measuring
downstream classification performance. Furthermore, we show that our method is
sample efficient--in terms of both the labels required for downstream training
and the amount of unlabeled data required for pre-training--and scales well to
the sizes of both the labeled and unlabeled data.
Related papers
- Semi-Supervised Variational Adversarial Active Learning via Learning to Rank and Agreement-Based Pseudo Labeling [6.771578432805963]
Active learning aims to alleviate the amount of labor involved in data labeling by automating the selection of unlabeled samples.
We introduce novel techniques that significantly improve the use of abundant unlabeled data during training.
We demonstrate the superior performance of our approach over the state of the art on various image classification and segmentation benchmark datasets.
arXiv Detail & Related papers (2024-08-23T00:35:07Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Enhancing Self-Training Methods [0.0]
Semi-supervised learning approaches train on small sets of labeled data along with large sets of unlabeled data.
Self-training is a semi-supervised teacher-student approach that often suffers from the problem of "confirmation bias"
arXiv Detail & Related papers (2023-01-18T03:56:17Z) - AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data [6.633920993895286]
We show that state-of-the-art SSL algorithms suffer a degradation in performance in the presence of unlabeled auxiliary data.
We propose AuxMix, an algorithm that leverages self-supervised learning tasks to learn generic features in order to mask auxiliary data that are not semantically similar to the labeled set.
arXiv Detail & Related papers (2022-06-14T16:25:20Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Investigating a Baseline Of Self Supervised Learning Towards Reducing
Labeling Costs For Image Classification [0.0]
The study implements the kaggle.com' cats-vs-dogs dataset, Mnist and Fashion-Mnist to investigate the self-supervised learning task.
Results show that the pretext process in the self-supervised learning improves the accuracy around 15% in the downstream classification task.
arXiv Detail & Related papers (2021-08-17T06:43:05Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.