Related papers: Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

URL: http://arxiv.org/abs/2307.09455v2
Date: Wed, 19 Jul 2023 04:13:11 GMT
Title: Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers
Authors: Jaeyoung Kim, Kyuheon Jung, Dongbin Na, Sion Jang, Eunbin Park, Sungchul Choi
Abstract summary: A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples. We propose a method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers.
Score: 3.8839179829686126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since these OOD samples lie near the ID manifold. A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples, but explicitly collecting auxiliary OOD datasets brings an additional burden for data collection. In this paper, we propose a simple but effective method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. The surrogate OOD sample introduced by POE shows a similar representation to ID data, which is most effective in training a rejection network. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers. A comprehensive comparison with state-of-the-art algorithms demonstrates POE's competitiveness on several text classification benchmarks.

Related papers

Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox [70.57120710151105]
Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. Some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. We construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue.
arXiv Detail & Related papers (2024-06-14T09:27:56Z)
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection [47.16254775587534]
We propose a novel OOD detection framework that discovers idlike outliers using CLIP citeDBLP:conf/icml/RadfordKHRGASAM21. Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model. Our method achieves superior few-shot learning performance on various real-world image datasets.
arXiv Detail & Related papers (2023-11-26T09:06:40Z)
Distilling the Unknown to Unveil Certainty [66.29929319664167]
Out-of-distribution (OOD) detection is essential in identifying test samples that deviate from the in-distribution (ID) data upon which a standard network is trained. This paper introduces OOD knowledge distillation, a pioneering learning framework applicable whether or not training ID data is available.
arXiv Detail & Related papers (2023-11-14T08:05:02Z)
Supervision Adaptation Balancing In-distribution Generalization and Out-of-distribution Detection [36.66825830101456]
In-distribution (ID) and out-of-distribution (OOD) samples can lead to textitdistributional vulnerability in deep neural networks. We introduce a novel textitsupervision adaptation approach to generate adaptive supervision information for OOD samples, making them more compatible with ID samples.
arXiv Detail & Related papers (2022-06-19T11:16:44Z)
Understanding, Detecting, and Separating Out-of-Distribution Samples and Adversarial Samples in Text Classification [80.81532239566992]
We compare the two types of anomalies (OOD and Adv samples) with the in-distribution (ID) ones from three aspects. We find that OOD samples expose their aberration starting from the first layer, while the abnormalities of Adv samples do not emerge until the deeper layers of the model. We propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output probabilities of the model.
arXiv Detail & Related papers (2022-04-09T12:11:59Z)
Energy-bounded Learning for Robust Models of Code [16.592638312365164]
In programming, learning code representations has a variety of applications, including code classification, code search, comment generation, bug prediction, and so on. We propose the use of an energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models.
arXiv Detail & Related papers (2021-12-20T06:28:56Z)
WOOD: Wasserstein-based Out-of-Distribution Detection [6.163329453024915]
Training data for deep-neural-network-based classifiers are usually assumed to be sampled from the same distribution. When part of the test samples are drawn from a distribution that is far away from that of the training samples, the trained neural network has a tendency to make high confidence predictions for these OOD samples. We propose a Wasserstein-based out-of-distribution detection (WOOD) method to overcome these challenges.
arXiv Detail & Related papers (2021-12-13T02:35:15Z)
Bridging In- and Out-of-distribution Samples for Their Better Discriminability [18.84265231678354]
We consider samples lying in the intermediate of the two and use them for training a network. We generate such samples using multiple image transformations that corrupt inputs in various ways and with different severity levels. We estimate where the generated samples by a single image transformation lie between ID and OOD using a network trained on clean ID samples.
arXiv Detail & Related papers (2021-01-07T11:34:18Z)
Learn what you can't learn: Regularized Ensembles for Transductive Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios. This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data. We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z)
Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available. We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data. Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.