Out-distribution aware Self-training in an Open World Setting
- URL: http://arxiv.org/abs/2012.12372v1
- Date: Mon, 21 Dec 2020 12:25:04 GMT
- Title: Out-distribution aware Self-training in an Open World Setting
- Authors: Maximilian Augustin, Matthias Hein
- Abstract summary: We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
- Score: 62.19882458285749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning heavily depends on large labeled datasets which limits further
improvements. While unlabeled data is available in large amounts, in particular
in image recognition, it does not fulfill the closed world assumption of
semi-supervised learning that all unlabeled data are task-related. The goal of
this paper is to leverage unlabeled data in an open world setting to further
improve prediction performance. For this purpose, we introduce out-distribution
aware self-training, which includes a careful sample selection strategy based
on the confidence of the classifier. While normal self-training deteriorates
prediction performance, our iterative scheme improves using up to 15 times the
amount of originally labeled data. Moreover, our classifiers are by design
out-distribution aware and can thus distinguish task-related inputs from
unrelated ones.
Related papers
- Semi-Supervised Variational Adversarial Active Learning via Learning to Rank and Agreement-Based Pseudo Labeling [6.771578432805963]
Active learning aims to alleviate the amount of labor involved in data labeling by automating the selection of unlabeled samples.
We introduce novel techniques that significantly improve the use of abundant unlabeled data during training.
We demonstrate the superior performance of our approach over the state of the art on various image classification and segmentation benchmark datasets.
arXiv Detail & Related papers (2024-08-23T00:35:07Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Enhancing Self-Training Methods [0.0]
Semi-supervised learning approaches train on small sets of labeled data along with large sets of unlabeled data.
Self-training is a semi-supervised teacher-student approach that often suffers from the problem of "confirmation bias"
arXiv Detail & Related papers (2023-01-18T03:56:17Z) - Self-Training: A Survey [5.772546394254112]
Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations.
Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years.
We present self-training methods for binary and multi-class classification; as well as their variants and two related approaches.
arXiv Detail & Related papers (2022-02-24T11:40:44Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - Learning the Prediction Distribution for Semi-Supervised Learning with
Normalising Flows [6.789370732159177]
Impressive results have been achieved in semi-supervised learning (SSL) for image classification, nearing fully supervised performance.
We propose a probabilistically principled general approach to SSL that considers the distribution over label predictions.
We demonstrate the general applicability of this approach on a range of computer vision tasks with varying output complexity.
arXiv Detail & Related papers (2020-07-06T13:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.