Related papers: Influence of Classification Task and Distribution Shift Type on OOD Detection in Fetal Ultrasound

Influence of Classification Task and Distribution Shift Type on OOD Detection in Fetal Ultrasound

URL: http://arxiv.org/abs/2509.18326v1
Date: Mon, 22 Sep 2025 18:49:25 GMT
Title: Influence of Classification Task and Distribution Shift Type on OOD Detection in Fetal Ultrasound
Authors: Chun Kit Wong, Anders N. Christensen, Cosmin I. Bercea, Julia A. Schnabel, Martin G. Tolsgaard, Aasa Feragen,
Abstract summary: OOD detection relies on estimating a classification model's uncertainty, which should increase for OOD samples.<n>We show that OOD detection performance significantly varies with the task, and that the best task depends on the defined ID-OOD criteria.<n>We reveal that superior OOD detection does not guarantee optimal abstained prediction, underscoring the necessity to align task selection and uncertainty strategies with the specific downstream application in medical image analysis.
Score: 6.857027660550724
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable out-of-distribution (OOD) detection is important for safe deployment of deep learning models in fetal ultrasound amidst heterogeneous image characteristics and clinical settings. OOD detection relies on estimating a classification model's uncertainty, which should increase for OOD samples. While existing research has largely focused on uncertainty quantification methods, this work investigates the impact of the classification task itself. Through experiments with eight uncertainty quantification methods across four classification tasks, we demonstrate that OOD detection performance significantly varies with the task, and that the best task depends on the defined ID-OOD criteria; specifically, whether the OOD sample is due to: i) an image characteristic shift or ii) an anatomical feature shift. Furthermore, we reveal that superior OOD detection does not guarantee optimal abstained prediction, underscoring the necessity to align task selection and uncertainty strategies with the specific downstream application in medical image analysis.

Related papers

NERO: Explainable Out-of-Distribution Detection with Neuron-level Relevance [13.36825494924134]
We propose a novel OOD scoring mechanism, called NERO, that leverages neuron-level relevance at the feature layer.<n>Specifically, we cluster neuron-level relevance for each in-distribution (ID) class to form representative centroids.<n>We refine performance by incorporating scaled relevance in the bias term and combining feature norms.
arXiv Detail & Related papers (2025-06-18T12:22:17Z)
Semantic or Covariate? A Study on the Intractable Case of Out-of-Distribution Detection [70.57120710151105]
We provide a more precise definition of the Semantic Space for the ID distribution. We also define the "Tractable OOD" setting which ensures the distinguishability of OOD and ID distributions.
arXiv Detail & Related papers (2024-11-18T03:09:39Z)
Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection [24.557227100200215]
Out-of-distribution (OOD) detection is crucial for deploying reliable machine learning models in open-world applications. Recent advances in CLIP-based OOD detection have shown promising results via regularizing prompt tuning with OOD features extracted from ID data. We propose a novel framework, namely, Self-Calibrated Tuning (SCT), to mitigate this problem for effective OOD detection with only the given few-shot ID data.
arXiv Detail & Related papers (2024-11-05T02:29:16Z)
TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision [6.290783164114315]
We introduce a test-time augmentation segment into the OOD detection pipeline. This augmentation shifts the pixel space, which translates into a more distinct semantic representation for OOD examples. We evaluate our method against existing state-of-the-art OOD scores.
arXiv Detail & Related papers (2024-07-19T04:50:54Z)
Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox [70.57120710151105]
Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. Some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. We construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue.
arXiv Detail & Related papers (2024-06-14T09:27:56Z)
Model-free Test Time Adaptation for Out-Of-Distribution Detection [62.49795078366206]
We propose a Non-Parametric Test Time textbfAdaptation framework for textbfDistribution textbfDetection (abbr) abbr utilizes online test samples for model adaptation during testing, enhancing adaptability to changing data distributions. We demonstrate the effectiveness of abbr through comprehensive experiments on multiple OOD detection benchmarks.
arXiv Detail & Related papers (2023-11-28T02:00:47Z)
Beyond AUROC & co. for evaluating out-of-distribution detection performance [50.88341818412508]
Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. We propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples.
arXiv Detail & Related papers (2023-06-26T12:51:32Z)
LINe: Out-of-Distribution Detection by Leveraging Important Neurons [15.797257361788812]
We introduce a new aspect for analyzing the difference in model outputs between in-distribution data and OOD data. We propose a novel method, Leveraging Important Neurons (LINe), for post-hoc Out of distribution detection.
arXiv Detail & Related papers (2023-03-24T13:49:05Z)
Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD. In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection. We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z)
Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need [52.88953913542445]
We find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. We take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD)
arXiv Detail & Related papers (2023-02-06T08:24:41Z)
Confidence-based Out-of-Distribution Detection: A Comparative Study and Analysis [17.398553230843717]
We assess the capability of various state-of-the-art approaches for confidence-based OOD detection. First, we leverage a computer vision benchmark to reproduce and compare multiple OOD detection methods. We then evaluate their capabilities on the challenging task of disease classification using chest X-rays.
arXiv Detail & Related papers (2021-07-06T12:10:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.