Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning
- URL: http://arxiv.org/abs/2108.05617v1
- Date: Thu, 12 Aug 2021 09:14:44 GMT
- Title: Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning
- Authors: Junkai Huang, Chaowei Fang, Weikai Chen, Zhenhua Chai, Xiaolin Wei,
Pengxu Wei, Liang Lin, Guanbin Li
- Abstract summary: Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
- Score: 101.28281124670647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-set semi-supervised learning (open-set SSL) investigates a challenging
but practical scenario where out-of-distribution (OOD) samples are contained in
the unlabeled data. While the mainstream technique seeks to completely filter
out the OOD samples for semi-supervised learning (SSL), we propose a novel
training mechanism that could effectively exploit the presence of OOD data for
enhanced feature learning while avoiding its adverse impact on the SSL. We
achieve this goal by first introducing a warm-up training that leverages all
the unlabeled data, including both the in-distribution (ID) and OOD samples.
Specifically, we perform a pretext task that enforces our feature extractor to
obtain a high-level semantic understanding of the training images, leading to
more discriminative features that can benefit the downstream tasks. Since the
OOD samples are inevitably detrimental to SSL, we propose a novel cross-modal
matching strategy to detect OOD samples. Instead of directly applying binary
classification, we train the network to predict whether the data sample is
matched to an assigned one-hot class label. The appeal of the proposed
cross-modal matching over binary classification is the ability to generate a
compatible feature space that aligns with the core classification task.
Extensive experiments show that our approach substantially lifts the
performance on open-set SSL and outperforms the state-of-the-art by a large
margin.
Related papers
- SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning [25.508200663171625]
Open-set semi-supervised learning (OSSL) uses practical open-set unlabeled data.
Prior OSSL methods suffer from the tendency to overtrust the labeled ID data.
We propose SCOMatch, a novel OSSL that treats OOD samples as an additional class, forming a new SSL process.
arXiv Detail & Related papers (2024-09-26T03:47:34Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - Robust Semi-supervised Learning by Wisely Leveraging Open-set Data [48.67897991121204]
Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set.
We propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model.
arXiv Detail & Related papers (2024-05-11T10:22:32Z) - Learning with Noisy Labels Using Collaborative Sample Selection and
Contrastive Semi-Supervised Learning [76.00798972439004]
Collaborative Sample Selection (CSS) removes noisy samples from identified clean set.
We introduce a co-training mechanism with a contrastive loss in semi-supervised learning.
arXiv Detail & Related papers (2023-10-24T05:37:20Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - Exploration and Exploitation of Unlabeled Data for Open-Set
Semi-Supervised Learning [130.56124475528475]
We address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.
Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
arXiv Detail & Related papers (2023-06-30T14:25:35Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.