Practical Evaluation of Out-of-Distribution Detection Methods for Image
Classification
- URL: http://arxiv.org/abs/2101.02447v1
- Date: Thu, 7 Jan 2021 09:28:45 GMT
- Title: Practical Evaluation of Out-of-Distribution Detection Methods for Image
Classification
- Authors: Engkarat Techapanurak, Takayuki Okatani
- Abstract summary: In this paper, we experimentally evaluate the performance of representative OOD detection methods for three scenarios.
The results show that differences in scenarios and datasets alter the relative performance among the methods.
Our results can also be used as a guide for the selection of OOD detection methods.
- Score: 22.26009759606856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We reconsider the evaluation of OOD detection methods for image recognition.
Although many studies have been conducted so far to build better OOD detection
methods, most of them follow Hendrycks and Gimpel's work for the method of
experimental evaluation. While the unified evaluation method is necessary for a
fair comparison, there is a question of if its choice of tasks and datasets
reflect real-world applications and if the evaluation results can generalize to
other OOD detection application scenarios. In this paper, we experimentally
evaluate the performance of representative OOD detection methods for three
scenarios, i.e., irrelevant input detection, novel class detection, and domain
shift detection, on various datasets and classification tasks. The results show
that differences in scenarios and datasets alter the relative performance among
the methods. Our results can also be used as a guide for practitioners for the
selection of OOD detection methods.
Related papers
- Beyond AUROC & co. for evaluating out-of-distribution detection
performance [50.88341818412508]
Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs.
We propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples.
arXiv Detail & Related papers (2023-06-26T12:51:32Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric
Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD.
In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection.
We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z) - Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is
All You Need [52.88953913542445]
We find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly.
We take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD)
arXiv Detail & Related papers (2023-02-06T08:24:41Z) - Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD
Training Data Estimate a Combination of the Same Core Quantities [104.02531442035483]
The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods.
We show that binary discrimination between in- and (different) out-distributions is equivalent to several distinct formulations of the OOD detection problem.
We also show that the confidence loss which is used by Outlier Exposure has an implicit scoring function which differs in a non-trivial fashion from the theoretically optimal scoring function.
arXiv Detail & Related papers (2022-06-20T16:32:49Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - Out-of-Distribution Detection for Medical Applications: Guidelines for
Practical Evaluation [0.0]
Out-of-Distribution (OOD) samples in real time is a crucial safety check for deployment of machine learning models in the medical field.
There is a lack of evaluation guidelines on how to select OOD detection methods in practice.
Here, we propose a series of practical considerations and tests to choose the best OOD detector for a specific medical dataset.
arXiv Detail & Related papers (2021-09-30T07:05:20Z) - Evaluation of Out-of-Distribution Detection Performance of
Self-Supervised Learning in a Controllable Environment [27.28750644075659]
We evaluate the out-of-distribution (OOD) detection performance of self-supervised learning (SSL) techniques with a new evaluation framework.
Unlike the previous evaluation methods, the proposed framework adjusts the distance of OOD samples from the in-distribution samples.
arXiv Detail & Related papers (2020-11-26T04:11:48Z) - Contrastive Training for Improved Out-of-Distribution Detection [36.61315534166451]
This paper proposes and investigates the use of contrastive training to boost OOD detection performance.
We show in extensive experiments that contrastive training significantly helps OOD detection performance on a number of common benchmarks.
arXiv Detail & Related papers (2020-07-10T18:40:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.