AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data
- URL: http://arxiv.org/abs/2206.06959v1
- Date: Tue, 14 Jun 2022 16:25:20 GMT
- Title: AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data
- Authors: Amin Banitalebi-Dehkordi, Pratik Gujjar, and Yong Zhang
- Abstract summary: We show that state-of-the-art SSL algorithms suffer a degradation in performance in the presence of unlabeled auxiliary data.
We propose AuxMix, an algorithm that leverages self-supervised learning tasks to learn generic features in order to mask auxiliary data that are not semantically similar to the labeled set.
- Score: 6.633920993895286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning (SSL) has seen great strides when labeled data is
scarce but unlabeled data is abundant. Critically, most recent work assume that
such unlabeled data is drawn from the same distribution as the labeled data. In
this work, we show that state-of-the-art SSL algorithms suffer a degradation in
performance in the presence of unlabeled auxiliary data that does not
necessarily possess the same class distribution as the labeled set. We term
this problem as Auxiliary-SSL and propose AuxMix, an algorithm that leverages
self-supervised learning tasks to learn generic features in order to mask
auxiliary data that are not semantically similar to the labeled set. We also
propose to regularize learning by maximizing the predicted entropy for
dissimilar auxiliary samples. We show an improvement of 5% over existing
baselines on a ResNet-50 model when trained on CIFAR10 dataset with 4k labeled
samples and all unlabeled data is drawn from the Tiny-ImageNet dataset. We
report competitive results on several datasets and conduct ablation studies.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation [87.17768598044427]
Traditional semi-supervised learning assumes that the feature distributions of labeled and unlabeled data are consistent.
We propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from different distributions.
Our proposed SSFA is applicable to various pseudo-label-based SSL learners and significantly improves performance in labeled, unlabeled, and even unseen distributions.
arXiv Detail & Related papers (2024-05-31T03:13:45Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Exploring the Boundaries of Semi-Supervised Facial Expression
Recognition: Learning from In-Distribution, Out-of-Distribution, and
Unconstrained Data [19.442685015494316]
We present a study on 11 of the most recent semi-supervised methods, in the context of facial expression recognition (FER)
Our investigation covers semi-supervised learning from in-distribution, out-of-distribution, unconstrained, and very small unlabelled data.
Our results demonstrate that FixMatch consistently achieves better performance on in-distribution unlabelled data, while ReMixMatch stands out among all methods for out-of-distribution, unconstrained, and scarce unlabelled data scenarios.
arXiv Detail & Related papers (2023-06-02T01:40:08Z) - Are labels informative in semi-supervised learning? -- Estimating and
leveraging the missing-data mechanism [4.675583319625962]
Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models.
It can be affected by the presence of informative'' labels, which occur when some classes are more likely to be labeled than others.
We propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm.
arXiv Detail & Related papers (2023-02-15T09:18:46Z) - Q-Match: Self-supervised Learning by Matching Distributions Induced by a
Queue [6.1678491628787455]
We introduce our algorithm, Q-Match, and show it is possible to induce the student-teacher distributions without any knowledge of downstream classes.
We show that our method is sample efficient--in terms of both the labels required for downstream training and the amount of unlabeled data required for pre-training--and scales well to the sizes of both the labeled and unlabeled data.
arXiv Detail & Related papers (2023-02-10T18:59:05Z) - Towards Realistic Semi-Supervised Learning [73.59557447798134]
We propose a novel approach to tackle SSL in open-world setting, where we simultaneously learn to classify known and unknown classes.
Our approach substantially outperforms the existing state-of-the-art on seven diverse datasets.
arXiv Detail & Related papers (2022-07-05T19:04:43Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - Unsupervised Semantic Aggregation and Deformable Template Matching for
Semi-Supervised Learning [34.560447389853614]
Unsupervised semantic aggregation based on T-MI loss is explored to generate semantic labels for unlabeled data.
A feature pool that stores the labeled samples is dynamically updated to assign proxy labels for unlabeled data.
Experiments and analysis validate that USADTM achieves top performance.
arXiv Detail & Related papers (2020-10-12T08:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.