Deployment of Image Analysis Algorithms under Prevalence Shifts
- URL: http://arxiv.org/abs/2303.12540v2
- Date: Mon, 24 Jul 2023 13:35:16 GMT
- Title: Deployment of Image Analysis Algorithms under Prevalence Shifts
- Authors: Patrick Godau and Piotr Kalinowski and Evangelia Christodoulou and
Annika Reinke and Minu Tizabi and Luciana Ferrer and Paul J\"ager and Lena
Maier-Hein
- Abstract summary: Domain gaps are among the most relevant roadblocks in the clinical translation of machine learning (ML)-based solutions for medical image analysis.
We propose a workflow for prevalence-aware image classification that uses estimated deployment prevalences to adjust a trained classifier to a new environment.
- Score: 6.373765910269204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain gaps are among the most relevant roadblocks in the clinical
translation of machine learning (ML)-based solutions for medical image
analysis. While current research focuses on new training paradigms and network
architectures, little attention is given to the specific effect of prevalence
shifts on an algorithm deployed in practice. Such discrepancies between class
frequencies in the data used for a method's development/validation and that in
its deployment environment(s) are of great importance, for example in the
context of artificial intelligence (AI) democratization, as disease prevalences
may vary widely across time and location. Our contribution is twofold. First,
we empirically demonstrate the potentially severe consequences of missing
prevalence handling by analyzing (i) the extent of miscalibration, (ii) the
deviation of the decision threshold from the optimum, and (iii) the ability of
validation metrics to reflect neural network performance on the deployment
population as a function of the discrepancy between development and deployment
prevalence. Second, we propose a workflow for prevalence-aware image
classification that uses estimated deployment prevalences to adjust a trained
classifier to a new environment, without requiring additional annotated
deployment data. Comprehensive experiments based on a diverse set of 30 medical
classification tasks showcase the benefit of the proposed workflow in
generating better classifier decisions and more reliable performance estimates
compared to current practice.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.