A Survey on Out-of-Distribution Detection in NLP
- URL: http://arxiv.org/abs/2305.03236v2
- Date: Wed, 27 Dec 2023 07:15:20 GMT
- Title: A Survey on Out-of-Distribution Detection in NLP
- Authors: Hao Lang, Yinhe Zheng, Yixuan Li, Jian Sun, Fei Huang, Yongbin Li
- Abstract summary: Out-of-distribution (OOD) detection is essential for the reliable and safe deployment of machine learning systems in the real world.
This paper presents the first review of recent advances in OOD detection with a particular focus on natural language processing approaches.
- Score: 119.80687868012393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-distribution (OOD) detection is essential for the reliable and safe
deployment of machine learning systems in the real world. Great progress has
been made over the past years. This paper presents the first review of recent
advances in OOD detection with a particular focus on natural language
processing approaches. First, we provide a formal definition of OOD detection
and discuss several related fields. We then categorize recent algorithms into
three classes according to the data they used: (1) OOD data available, (2) OOD
data unavailable + in-distribution (ID) label available, and (3) OOD data
unavailable + ID label unavailable. Third, we introduce datasets, applications,
and metrics. Finally, we summarize existing work and present potential future
research topics.
Related papers
- Recent Advances in OOD Detection: Problems and Approaches [40.27656150526273]
Out-of-distribution (OOD) detection aims to detect test samples outside the training category space.
We provide a discussion of the evaluation scenarios, a variety of applications, and several future research directions.
arXiv Detail & Related papers (2024-09-18T11:30:30Z) - EAT: Towards Long-Tailed Out-of-Distribution Detection [55.380390767978554]
This paper addresses the challenging task of long-tailed OOD detection.
The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes.
We propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes, and (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data.
arXiv Detail & Related papers (2023-12-14T13:47:13Z) - Out-of-distribution Detection Learning with Unreliable
Out-of-distribution Sources [73.28967478098107]
Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data.
It is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning OOD patterns.
We propose a data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation.
arXiv Detail & Related papers (2023-11-06T16:26:52Z) - APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection [40.846633965439956]
This paper focuses on a few-shot OOD setting where there are only a few labeled IND data and massive unlabeled mixed data.
We propose an adaptive pseudo-labeling (APP) method for few-shot OOD detection.
arXiv Detail & Related papers (2023-10-20T09:48:52Z) - Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric
Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD.
In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection.
We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z) - Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is
All You Need [52.88953913542445]
We find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly.
We take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD)
arXiv Detail & Related papers (2023-02-06T08:24:41Z) - Pseudo-OOD training for robust language models [78.15712542481859]
OOD detection is a key component of a reliable machine-learning model for any industry-scale application.
We propose POORE - POsthoc pseudo-Ood REgularization, that generates pseudo-OOD samples using in-distribution (IND) data.
We extensively evaluate our framework on three real-world dialogue systems, achieving new state-of-the-art in OOD detection.
arXiv Detail & Related papers (2022-10-17T14:32:02Z) - Igeood: An Information Geometry Approach to Out-of-Distribution
Detection [35.04325145919005]
We introduce Igeood, an effective method for detecting out-of-distribution (OOD) samples.
Igeood applies to any pre-trained neural network, works under various degrees of access to the machine learning model.
We show that Igeood outperforms competing state-of-the-art methods on a variety of network architectures and datasets.
arXiv Detail & Related papers (2022-03-15T11:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.