Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in
Medical Images
- URL: http://arxiv.org/abs/2303.04862v1
- Date: Wed, 8 Mar 2023 19:58:41 GMT
- Title: Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in
Medical Images
- Authors: Lisa M. Koch, Christian M. Sch\"urch, Christian F. Baumgartner, Arthur
Gretton, Philipp Berens
- Abstract summary: We focus on the detection of subgroup shifts in machine learning systems.
Recent state-of-the-art statistical tests can be effectively applied to subgroup shift detection on medical imaging data.
- Score: 21.01688837312175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distribution shifts remain a fundamental problem for the safe application of
machine learning systems. If undetected, they may impact the real-world
performance of such systems or will at least render original performance claims
invalid. In this paper, we focus on the detection of subgroup shifts, a type of
distribution shift that can occur when subgroups have a different prevalence
during validation compared to the deployment setting. For example, algorithms
developed on data from various acquisition settings may be predominantly
applied in hospitals with lower quality data acquisition, leading to an
inadvertent performance drop. We formulate subgroup shift detection in the
framework of statistical hypothesis testing and show that recent
state-of-the-art statistical tests can be effectively applied to subgroup shift
detection on medical imaging data. We provide synthetic experiments as well as
extensive evaluation on clinically meaningful subgroup shifts on histopathology
as well as retinal fundus images. We conclude that classifier-based subgroup
shift detection tests could be a particularly useful tool for post-market
surveillance of deployed ML systems.
Related papers
- Automatic dataset shift identification to support root cause analysis of AI performance drift [13.996602963045387]
Shifts in data distribution can substantially harm the performance of clinical AI models.
We propose the first unsupervised dataset shift identification framework.
We report promising results for the proposed framework on five types of real-world dataset shifts.
arXiv Detail & Related papers (2024-11-12T17:09:20Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - The Role of Subgroup Separability in Group-Fair Medical Image
Classification [18.29079361470428]
We find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis.
Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
arXiv Detail & Related papers (2023-07-06T06:06:47Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Identification of Systematic Errors of Image Classifiers on Rare
Subgroups [12.064692111429494]
systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift.
We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance.
We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy.
arXiv Detail & Related papers (2023-03-09T07:08:25Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - A unified framework for dataset shift diagnostics [2.449909275410288]
Supervised learning techniques typically assume training data originates from the target population.
Yet, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors.
We propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts.
arXiv Detail & Related papers (2022-05-17T13:34:45Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Deep Learning in current Neuroimaging: a multivariate approach with
power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures.
A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods.
We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.