Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric
Perspective
- URL: http://arxiv.org/abs/2302.08287v1
- Date: Thu, 16 Feb 2023 13:34:35 GMT
- Title: Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric
Perspective
- Authors: Yuhang Zhang, Weihong Deng, Liang Zheng
- Abstract summary: Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD.
In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection.
We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
- Score: 55.45202687256175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-distribution (OOD) detection methods assume that they have test ground
truths, i.e., whether individual test samples are in-distribution (IND) or OOD.
However, in the real world, we do not always have such ground truths, and thus
do not know which sample is correctly detected and cannot compute the metric
like AUROC to evaluate the performance of different OOD detection methods. In
this paper, we are the first to introduce the unsupervised evaluation problem
in OOD detection, which aims to evaluate OOD detection methods in real-world
changing environments without OOD labels. We propose three methods to compute
Gscore as an unsupervised indicator of OOD detection performance. We further
introduce a new benchmark Gbench, which has 200 real-world OOD datasets of
various label spaces to train and evaluate our method. Through experiments, we
find a strong quantitative correlation betwwen Gscore and the OOD detection
performance. Extensive experiments demonstrate that our Gscore achieves
state-of-the-art performance. Gscore also generalizes well with different
IND/OOD datasets, OOD detection methods, backbones and dataset sizes. We
further provide interesting analyses of the effects of backbones and IND/OOD
datasets on OOD detection performance. The data and code will be available.
Related papers
- Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution [38.844580833635725]
We present a training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs.
Our method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks.
arXiv Detail & Related papers (2024-07-23T12:28:59Z) - Model-free Test Time Adaptation for Out-Of-Distribution Detection [62.49795078366206]
We propose a Non-Parametric Test Time textbfAdaptation framework for textbfDistribution textbfDetection (abbr)
abbr utilizes online test samples for model adaptation during testing, enhancing adaptability to changing data distributions.
We demonstrate the effectiveness of abbr through comprehensive experiments on multiple OOD detection benchmarks.
arXiv Detail & Related papers (2023-11-28T02:00:47Z) - Beyond AUROC & co. for evaluating out-of-distribution detection
performance [50.88341818412508]
Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs.
We propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples.
arXiv Detail & Related papers (2023-06-26T12:51:32Z) - In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation [43.865923770543205]
Out-of-distribution (OOD) detection is the problem of identifying inputs unrelated to the in-distribution task.
Most of the currently used test OOD datasets, including datasets from the open set recognition (OSR) literature, have severe issues.
We introduce with NINCO a novel test OOD dataset, each sample checked to be ID free, which allows for a detailed analysis of an OOD detector's strengths and failure modes.
arXiv Detail & Related papers (2023-06-01T15:48:10Z) - Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is
All You Need [52.88953913542445]
We find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly.
We take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD)
arXiv Detail & Related papers (2023-02-06T08:24:41Z) - Towards Realistic Out-of-Distribution Detection: A Novel Evaluation
Framework for Improving Generalization in OOD Detection [14.541761912174799]
This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection.
It aims to assess the performance of machine learning models in more realistic settings.
arXiv Detail & Related papers (2022-11-20T07:30:15Z) - Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data.
In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier.
In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z) - Practical Evaluation of Out-of-Distribution Detection Methods for Image
Classification [22.26009759606856]
In this paper, we experimentally evaluate the performance of representative OOD detection methods for three scenarios.
The results show that differences in scenarios and datasets alter the relative performance among the methods.
Our results can also be used as a guide for the selection of OOD detection methods.
arXiv Detail & Related papers (2021-01-07T09:28:45Z) - ATOM: Robustifying Out-of-distribution Detection Using Outlier Mining [51.19164318924997]
Adrial Training with informative Outlier Mining improves robustness of OOD detection.
ATOM achieves state-of-the-art performance under a broad family of classic and adversarial OOD evaluation tasks.
arXiv Detail & Related papers (2020-06-26T20:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.