Pseudo Outlier Exposure for Out-of-Distribution Detection using
Pretrained Transformers
- URL: http://arxiv.org/abs/2307.09455v2
- Date: Wed, 19 Jul 2023 04:13:11 GMT
- Title: Pseudo Outlier Exposure for Out-of-Distribution Detection using
Pretrained Transformers
- Authors: Jaeyoung Kim, Kyuheon Jung, Dongbin Na, Sion Jang, Eunbin Park,
Sungchul Choi
- Abstract summary: A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples.
We propose a method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes.
Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers.
- Score: 3.8839179829686126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For real-world language applications, detecting an out-of-distribution (OOD)
sample is helpful to alert users or reject such unreliable samples. However,
modern over-parameterized language models often produce overconfident
predictions for both in-distribution (ID) and OOD samples. In particular,
language models suffer from OOD samples with a similar semantic representation
to ID samples since these OOD samples lie near the ID manifold. A rejection
network can be trained with ID and diverse outlier samples to detect test OOD
samples, but explicitly collecting auxiliary OOD datasets brings an additional
burden for data collection. In this paper, we propose a simple but effective
method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD
dataset by sequentially masking tokens related to ID classes. The surrogate OOD
sample introduced by POE shows a similar representation to ID data, which is
most effective in training a rejection network. Our method does not require any
external OOD data and can be easily implemented within off-the-shelf
Transformers. A comprehensive comparison with state-of-the-art algorithms
demonstrates POE's competitiveness on several text classification benchmarks.
Related papers
- Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox [70.57120710151105]
Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data.
Some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox.
We construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue.
arXiv Detail & Related papers (2024-06-14T09:27:56Z) - ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection [47.16254775587534]
We propose a novel OOD detection framework that discovers idlike outliers using CLIP citeDBLP:conf/icml/RadfordKHRGASAM21.
Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model.
Our method achieves superior few-shot learning performance on various real-world image datasets.
arXiv Detail & Related papers (2023-11-26T09:06:40Z) - Distilling the Unknown to Unveil Certainty [66.29929319664167]
Out-of-distribution (OOD) detection is essential in identifying test samples that deviate from the in-distribution (ID) data upon which a standard network is trained.
This paper introduces OOD knowledge distillation, a pioneering learning framework applicable whether or not training ID data is available.
arXiv Detail & Related papers (2023-11-14T08:05:02Z) - Supervision Adaptation Balancing In-distribution Generalization and
Out-of-distribution Detection [36.66825830101456]
In-distribution (ID) and out-of-distribution (OOD) samples can lead to textitdistributional vulnerability in deep neural networks.
We introduce a novel textitsupervision adaptation approach to generate adaptive supervision information for OOD samples, making them more compatible with ID samples.
arXiv Detail & Related papers (2022-06-19T11:16:44Z) - Understanding, Detecting, and Separating Out-of-Distribution Samples and
Adversarial Samples in Text Classification [80.81532239566992]
We compare the two types of anomalies (OOD and Adv samples) with the in-distribution (ID) ones from three aspects.
We find that OOD samples expose their aberration starting from the first layer, while the abnormalities of Adv samples do not emerge until the deeper layers of the model.
We propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output probabilities of the model.
arXiv Detail & Related papers (2022-04-09T12:11:59Z) - Energy-bounded Learning for Robust Models of Code [16.592638312365164]
In programming, learning code representations has a variety of applications, including code classification, code search, comment generation, bug prediction, and so on.
We propose the use of an energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models.
arXiv Detail & Related papers (2021-12-20T06:28:56Z) - WOOD: Wasserstein-based Out-of-Distribution Detection [6.163329453024915]
Training data for deep-neural-network-based classifiers are usually assumed to be sampled from the same distribution.
When part of the test samples are drawn from a distribution that is far away from that of the training samples, the trained neural network has a tendency to make high confidence predictions for these OOD samples.
We propose a Wasserstein-based out-of-distribution detection (WOOD) method to overcome these challenges.
arXiv Detail & Related papers (2021-12-13T02:35:15Z) - Bridging In- and Out-of-distribution Samples for Their Better
Discriminability [18.84265231678354]
We consider samples lying in the intermediate of the two and use them for training a network.
We generate such samples using multiple image transformations that corrupt inputs in various ways and with different severity levels.
We estimate where the generated samples by a single image transformation lie between ID and OOD using a network trained on clean ID samples.
arXiv Detail & Related papers (2021-01-07T11:34:18Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.