OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers
- URL: http://arxiv.org/abs/2105.14148v1
- Date: Fri, 28 May 2021 23:57:15 GMT
- Title: OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers
- Authors: Kuniaki Saito, Donghyun Kim, Kate Saenko
- Abstract summary: We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
- Score: 71.08167292329028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised learning (SSL) is an effective means to leverage unlabeled
data to improve a model's performance. Typical SSL methods like FixMatch assume
that labeled and unlabeled data share the same label space. However, in
practice, unlabeled data can contain categories unseen in the labeled set,
i.e., outliers, which can significantly harm the performance of SSL algorithms.
To address this problem, we propose a novel Open-set Semi-Supervised Learning
(OSSL) approach called OpenMatch. Learning representations of inliers while
rejecting outliers is essential for the success of OSSL. To this end, OpenMatch
unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
The OVA-classifier outputs the confidence score of a sample being an inlier,
providing a threshold to detect outliers. Another key contribution is an
open-set soft-consistency regularization loss, which enhances the smoothness of
the OVA-classifier with respect to input transformations and greatly improves
outlier detection. OpenMatch achieves state-of-the-art performance on three
datasets, and even outperforms a fully supervised model in detecting outliers
unseen in unlabeled data on CIFAR10.
Related papers
- OwMatch: Conditional Self-Labeling with Consistency for Open-World Semi-Supervised Learning [4.462726364160216]
Semi-supervised learning (SSL) offers a robust framework for harnessing the potential of unannotated data.
The emergence of open-world SSL (OwSSL) introduces a more practical challenge, wherein unlabeled data may encompass samples from unseen classes.
We propose an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding.
arXiv Detail & Related papers (2024-11-04T06:07:43Z) - SSB: Simple but Strong Baseline for Boosting Performance of Open-Set
Semi-Supervised Learning [106.46648817126984]
In this paper, we study the challenging and realistic open-set SSL setting.
The goal is to both correctly classify inliers and to detect outliers.
We find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data.
arXiv Detail & Related papers (2023-11-17T15:14:40Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint
Inliers and Outliers Utilization [36.102831230805755]
In many real-world applications, unlabeled data will inevitably contain unseen-class outliers not belonging to any of the labeled classes.
We introduce a novel open-set SSL framework, IOMatch, which can jointly utilize inliers and outliers, even when it is difficult to distinguish exactly between them.
arXiv Detail & Related papers (2023-08-25T04:14:02Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - NorMatch: Matching Normalizing Flows with Discriminative Classifiers for
Semi-Supervised Learning [8.749830466953584]
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data.
In this work we introduce a new framework for SSL named NorMatch.
We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
arXiv Detail & Related papers (2022-11-17T15:39:18Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set
Unlabeled Data [65.19205979542305]
Unlabeled data may include out-of-class samples in practice.
OpenCoS is a method for handling this realistic semi-supervised learning scenario.
arXiv Detail & Related papers (2021-06-29T06:10:05Z) - Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning [26.069534478556527]
Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce.
Most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets.
In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations.
arXiv Detail & Related papers (2021-06-01T03:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.