They are Not Completely Useless: Towards Recycling Transferable
Unlabeled Data for Class-Mismatched Semi-Supervised Learning
- URL: http://arxiv.org/abs/2011.13529v4
- Date: Tue, 12 Apr 2022 05:56:04 GMT
- Title: They are Not Completely Useless: Towards Recycling Transferable
Unlabeled Data for Class-Mismatched Semi-Supervised Learning
- Authors: Zhuo Huang, Ying Tai, Chengjie Wang, Jian Yang, Chen Gong
- Abstract summary: Semi-Supervised Learning (SSL) with mismatched classes deals with the problem that the classes-of-interests in the limited labeled data is only a subset of the classes in massive unlabeled data.
This paper proposes a "Transferable OOD data Recycling" (TOOR) method to enrich the information for conducting class-mismatched SSL.
- Score: 61.46572463531167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-Supervised Learning (SSL) with mismatched classes deals with the problem
that the classes-of-interests in the limited labeled data is only a subset of
the classes in massive unlabeled data. As a result, the classes only possessed
by the unlabeled data may mislead the classifier training and thus hindering
the realistic landing of various SSL methods. To solve this problem, existing
methods usually divide unlabeled data to in-distribution (ID) data and
out-of-distribution (OOD) data, and directly discard or weaken the OOD data to
avoid their adverse impact. In other words, they treat OOD data as completely
useless and thus the potential valuable information for classification
contained by them is totally ignored. To remedy this defect, this paper
proposes a "Transferable OOD data Recycling" (TOOR) method which properly
utilizes ID data as well as the "recyclable" OOD data to enrich the information
for conducting class-mismatched SSL. Specifically, TOOR firstly attributes all
unlabeled data to ID data or OOD data, among which the ID data are directly
used for training. Then we treat the OOD data that have a close relationship
with ID data and labeled data as recyclable, and employ adversarial domain
adaptation to project them to the space of ID data and labeled data. In other
words, the recyclability of an OOD datum is evaluated by its transferability,
and the recyclable OOD data are transferred so that they are compatible with
the distribution of known classes-of-interests. Consequently, our TOOR method
extracts more information from unlabeled data than existing approaches, so it
can achieve the improved performance which is demonstrated by the experiments
on typical benchmark datasets.
Related papers
- RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure [21.809270017579806]
Deep learning models often face challenges from both imbalanced (long-tailed) and out-of-distribution (OOD) data.
Our research shows that data mixing can generate pseudo-OOD data that exhibit the features of both in-distribution (ID) data and OOD data.
We propose a unified framework called Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure (RICASSO)
arXiv Detail & Related papers (2024-10-14T14:29:32Z) - Safe Semi-Supervised Contrastive Learning Using In-Distribution Data as Positive Examples [3.4546761246181696]
We propose a self-supervised contrastive learning approach to fully exploit a large amount of unlabeled data.
The results show that self-supervised contrastive learning significantly improves classification accuracy.
arXiv Detail & Related papers (2024-08-03T22:33:13Z) - How Does Unlabeled Data Provably Help Out-of-Distribution Detection? [63.41681272937562]
Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data.
This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
arXiv Detail & Related papers (2024-02-05T20:36:33Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning [50.48888534815361]
In this paper, we empirically analyze Pseudo-Labeling (PL) in class-mismatched SSL.
PL is a simple and representative SSL method that transforms SSL problems into supervised learning by creating pseudo-labels for unlabeled data.
We propose to improve PL in class-mismatched SSL with two components -- Re-balanced Pseudo-Labeling (RPL) and Semantic Exploration Clustering (SEC)
arXiv Detail & Related papers (2023-01-15T03:21:59Z) - Exploiting Mixed Unlabeled Data for Detecting Samples of Seen and Unseen
Out-of-Distribution Classes [5.623232537411766]
Out-of-Distribution (OOD) detection is essential in real-world applications, which has attracted increasing attention in recent years.
Most existing OOD detection methods require many labeled In-Distribution (ID) data, causing a heavy labeling cost.
In this paper, we focus on the more realistic scenario, where limited labeled data and abundant unlabeled data are available.
We propose the Adaptive In-Out-aware Learning (AIOL) method, in which we adaptively select potential ID and OOD samples from the mixed unlabeled data.
arXiv Detail & Related papers (2022-10-13T08:34:25Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.