On The Consistency Training for Open-Set Semi-Supervised Learning
- URL: http://arxiv.org/abs/2101.08237v1
- Date: Tue, 19 Jan 2021 12:38:17 GMT
- Title: On The Consistency Training for Open-Set Semi-Supervised Learning
- Authors: Huixiang Luo, Hao Cheng, Yuting Gao, Ke Li, Mengdan Zhang, Fanxu Meng,
Xiaowei Guo, Feiyue Huang, Xing Sun
- Abstract summary: We study how OOD samples affect training in both low- and high-dimensional spaces.
Our method makes better use of OOD samples and achieves state-of-the-art results.
- Score: 44.046578996049654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional semi-supervised learning (SSL) methods, e.g., MixMatch, achieve
great performance when both labeled and unlabeled dataset are drawn from the
same distribution. However, these methods often suffer severe performance
degradation in a more realistic setting, where unlabeled dataset contains
out-of-distribution (OOD) samples. Recent approaches mitigate the negative
influence of OOD samples by filtering them out from the unlabeled data. Our
studies show that it is not necessary to get rid of OOD samples during
training. On the contrary, the network can benefit from them if OOD samples are
properly utilized. We thoroughly study how OOD samples affect DNN training in
both low- and high-dimensional spaces, where two fundamental SSL methods are
considered: Pseudo Labeling (PL) and Data Augmentation based Consistency
Training (DACT). Conclusion is twofold: (1) unlike PL that suffers performance
degradation, DACT brings improvement to model performance; (2) the improvement
is closely related to class-wise distribution gap between the labeled and the
unlabeled dataset. Motivated by this observation, we further improve the model
performance by bridging the gap between the labeled and the unlabeled datasets
(containing OOD samples). Compared to previous algorithms paying much attention
to distinguishing between ID and OOD samples, our method makes better use of
OOD samples and achieves state-of-the-art results.
Related papers
- SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning [25.508200663171625]
Open-set semi-supervised learning (OSSL) uses practical open-set unlabeled data.
Prior OSSL methods suffer from the tendency to overtrust the labeled ID data.
We propose SCOMatch, a novel OSSL that treats OOD samples as an additional class, forming a new SSL process.
arXiv Detail & Related papers (2024-09-26T03:47:34Z) - Deep Metric Learning-Based Out-of-Distribution Detection with Synthetic Outlier Exposure [0.0]
We propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs)
In the experiments, we found that metric learning-based loss functions perform better than the softmax.
Our approach outperforms strong baselines in conventional OOD detection metrics.
arXiv Detail & Related papers (2024-05-01T16:58:22Z) - How Does Unlabeled Data Provably Help Out-of-Distribution Detection? [63.41681272937562]
Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data.
This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
arXiv Detail & Related papers (2024-02-05T20:36:33Z) - ReSmooth: Detecting and Utilizing OOD Samples when Training with Data
Augmentation [57.38418881020046]
Recent DA techniques always meet the need for diversity in augmented training samples.
An augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented samples.
We propose ReSmooth, a framework that firstly detects OOD samples in augmented samples and then leverages them.
arXiv Detail & Related papers (2022-05-25T09:29:27Z) - NGC: A Unified Framework for Learning with Open-World Noisy Data [36.96188289965334]
We propose a new graph-based framework, namely Noisy Graph Cleaning (NGC), which collects clean samples by leveraging geometric structure of data and model predictive confidence.
We conduct experiments on multiple benchmarks with different types of noise and the results demonstrate the superior performance of our method against state of the arts.
arXiv Detail & Related papers (2021-08-25T04:04:46Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.