Exploration and Exploitation of Unlabeled Data for Open-Set
Semi-Supervised Learning
- URL: http://arxiv.org/abs/2306.17699v1
- Date: Fri, 30 Jun 2023 14:25:35 GMT
- Title: Exploration and Exploitation of Unlabeled Data for Open-Set
Semi-Supervised Learning
- Authors: Ganlong Zhao, Guanbin Li, Yipeng Qin, Jinjin Zhang, Zhenhua Chai,
Xiaolin Wei, Liang Lin, Yizhou Yu
- Abstract summary: We address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.
Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
- Score: 130.56124475528475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address a complex but practical scenario in semi-supervised
learning (SSL) named open-set SSL, where unlabeled data contain both
in-distribution (ID) and out-of-distribution (OOD) samples. Unlike previous
methods that only consider ID samples to be useful and aim to filter out OOD
ones completely during training, we argue that the exploration and exploitation
of both ID and OOD samples can benefit SSL. To support our claim, i) we propose
a prototype-based clustering and identification algorithm that explores the
inherent similarity and difference among samples at feature level and
effectively cluster them around several predefined ID and OOD prototypes,
thereby enhancing feature learning and facilitating ID/OOD identification; ii)
we propose an importance-based sampling method that exploits the difference in
importance of each ID and OOD sample to SSL, thereby reducing the sampling bias
and improving the training. Our proposed method achieves state-of-the-art in
several challenging benchmarks, and improves upon existing SSL methods even
when ID samples are totally absent in unlabeled data.
Related papers
- SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning [25.508200663171625]
Open-set semi-supervised learning (OSSL) uses practical open-set unlabeled data.
Prior OSSL methods suffer from the tendency to overtrust the labeled ID data.
We propose SCOMatch, a novel OSSL that treats OOD samples as an additional class, forming a new SSL process.
arXiv Detail & Related papers (2024-09-26T03:47:34Z) - Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation [87.17768598044427]
Traditional semi-supervised learning assumes that the feature distributions of labeled and unlabeled data are consistent.
We propose Self-Supervised Feature Adaptation (SSFA), a generic framework for improving SSL performance when labeled and unlabeled data come from different distributions.
Our proposed SSFA is applicable to various pseudo-label-based SSL learners and significantly improves performance in labeled, unlabeled, and even unseen distributions.
arXiv Detail & Related papers (2024-05-31T03:13:45Z) - ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection [47.16254775587534]
We propose a novel OOD detection framework that discovers idlike outliers using CLIP citeDBLP:conf/icml/RadfordKHRGASAM21.
Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model.
Our method achieves superior few-shot learning performance on various real-world image datasets.
arXiv Detail & Related papers (2023-11-26T09:06:40Z) - InstanT: Semi-supervised Learning with Instance-dependent Thresholds [75.91684890150283]
We propose the study of instance-dependent thresholds, which has the highest degree of freedom compared with existing methods.
We devise a novel instance-dependent threshold function for all unlabeled instances by utilizing their instance-level ambiguity and the instance-dependent error rates of pseudo-labels.
arXiv Detail & Related papers (2023-10-29T05:31:43Z) - On the Effectiveness of Out-of-Distribution Data in Self-Supervised
Long-Tail Learning [15.276356824489431]
We propose Contrastive with Out-of-distribution (OOD) data for Long-Tail learning (COLT)
We empirically identify the counter-intuitive usefulness of OOD samples in SSL long-tailed learning.
Our method significantly improves the performance of SSL on long-tailed datasets by a large margin.
arXiv Detail & Related papers (2023-06-08T04:32:10Z) - Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.