Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
- URL: http://arxiv.org/abs/2406.09867v3
- Date: Thu, 31 Oct 2024 01:23:48 GMT
- Title: Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
- Authors: Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen,
- Abstract summary: Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data.
Some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox.
We construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue.
- Score: 70.57120710151105
- License:
- Abstract: Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. However, some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. In this paper, we construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue, in which we divide the test samples into subsets with different semantic and covariate shift degrees relative to the ID dataset. The data division is achieved through a shift measuring method based on our proposed Language Aligned Image feature Decomposition (LAID). Moreover, we construct a Synthetic Incremental Shift (Syn-IS) dataset that contains high-quality generated images with more diverse covariate contents to complement the IS-OOD benchmark. We evaluate current OOD detection methods on our benchmark and find several important insights: (1) The performance of most OOD detection methods significantly improves as the semantic shift increases; (2) Some methods like GradNorm may have different OOD detection mechanisms as they rely less on semantic shifts to make decisions; (3) Excessive covariate shifts in the image are also likely to be considered as OOD for some methods. Our code and data are released in https://github.com/qqwsad5/IS-OOD.
Related papers
- Margin-bounded Confidence Scores for Out-of-Distribution Detection [2.373572816573706]
We propose a novel method called Margin bounded Confidence Scores (MaCS) to address the nontrivial OOD detection problem.
MaCS enlarges the disparity between ID and OOD scores, which in turn makes the decision boundary more compact.
Experiments on various benchmark datasets for image classification tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-09-22T05:40:25Z) - Distilling the Unknown to Unveil Certainty [66.29929319664167]
Out-of-distribution (OOD) detection is essential in identifying test samples that deviate from the in-distribution (ID) data upon which a standard network is trained.
This paper introduces OOD knowledge distillation, a pioneering learning framework applicable whether or not training ID data is available.
arXiv Detail & Related papers (2023-11-14T08:05:02Z) - General-Purpose Multi-Modal OOD Detection Framework [5.287829685181842]
Out-of-distribution (OOD) detection identifies test samples that differ from the training data, which is critical to ensuring the safety and reliability of machine learning (ML) systems.
We propose a general-purpose weakly-supervised OOD detection framework, called WOOD, that combines a binary classifier and a contrastive learning component.
We evaluate the proposed WOOD model on multiple real-world datasets, and the experimental results demonstrate that the WOOD model outperforms the state-of-the-art methods for multi-modal OOD detection.
arXiv Detail & Related papers (2023-07-24T18:50:49Z) - Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning [17.409939628100517]
We propose a unified framework termed OOD Semantic Pruning (OSP), which aims at pruning OOD semantics out from in-distribution (ID) features.
OSP surpasses the previous state-of-the-art by 13.7% on accuracy for ID classification and 5.9% on AUROC for OOD detection on TinyImageNet dataset.
arXiv Detail & Related papers (2023-05-29T15:37:07Z) - Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric
Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD.
In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection.
We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z) - Estimating Soft Labels for Out-of-Domain Intent Detection [122.68266151023676]
Out-of-Domain (OOD) intent detection is important for practical dialog systems.
We propose an adaptive soft pseudo labeling (ASoul) method that can estimate soft labels for pseudo OOD samples.
arXiv Detail & Related papers (2022-11-10T13:31:13Z) - Full-Spectrum Out-of-Distribution Detection [42.98617540431124]
We take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection.
We propose SEM, a simple feature-based semantics score function.
SEM significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2022-04-11T17:59:14Z) - Exploring Covariate and Concept Shift for Detection and Calibration of
Out-of-Distribution Data [77.27338842609153]
characterization reveals that sensitivity to each type of shift is important to the detection and confidence calibration of OOD data.
We propose a geometrically-inspired method to improve OOD detection under both shifts with only in-distribution data.
We are the first to propose a method that works well across both OOD detection and calibration and under different types of shifts.
arXiv Detail & Related papers (2021-10-28T15:42:55Z) - No True State-of-the-Art? OOD Detection Methods are Inconsistent across
Datasets [69.725266027309]
Out-of-distribution detection is an important component of reliable ML systems.
In this work, we show that none of these methods are inherently better at OOD detection than others on a standardized set of 16 pairs.
We also show that a method outperforming another on a certain (ID, OOD) pair may not do so in a low-data regime.
arXiv Detail & Related papers (2021-09-12T16:35:00Z) - Semantically Coherent Out-of-Distribution Detection [26.224146828317277]
Current out-of-distribution (OOD) detection benchmarks are commonly built by defining one dataset as in-distribution (ID) and all others as OOD.
We re-design the benchmarks and propose the semantically coherent out-of-distribution detection (SC-OOD)
Our approach achieves state-of-the-art performance on SC-OOD benchmarks.
arXiv Detail & Related papers (2021-08-26T17:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.