Estimating Test Performance for AI Medical Devices under Distribution
Shift with Conformal Prediction
- URL: http://arxiv.org/abs/2207.05796v1
- Date: Tue, 12 Jul 2022 19:25:21 GMT
- Title: Estimating Test Performance for AI Medical Devices under Distribution
Shift with Conformal Prediction
- Authors: Charles Lu, Syed Rakin Ahmed, Praveer Singh, Jayashree Kalpathy-Cramer
- Abstract summary: We consider the task of predicting the test accuracy of an arbitrary black-box model on an unlabeled target domain.
We propose a "black-box" test estimation technique based on conformal prediction and evaluate it against other methods.
- Score: 4.395519864600419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the test performance of software AI-based medical devices under
distribution shifts is crucial for evaluating the safety, efficiency, and
usability prior to clinical deployment. Due to the nature of regulated medical
device software and the difficulty in acquiring large amounts of labeled
medical datasets, we consider the task of predicting the test accuracy of an
arbitrary black-box model on an unlabeled target domain without modification to
the original training process or any distributional assumptions of the original
source data (i.e. we treat the model as a "black-box" and only use the
predicted output responses). We propose a "black-box" test estimation technique
based on conformal prediction and evaluate it against other methods on three
medical imaging datasets (mammography, dermatology, and histopathology) under
several clinically relevant types of distribution shift (institution, hardware
scanner, atlas, hospital). We hope that by promoting practical and effective
estimation techniques for black-box models, manufacturers of medical devices
will develop more standardized and realistic evaluation procedures to improve
the robustness and trustworthiness of clinical AI tools.
Related papers
- Integrating Clinical Knowledge into Concept Bottleneck Models [18.26357481872999]
Concept bottleneck models (CBMs) predict human-interpretable concepts before predicting the final output.
We propose integrating clinical knowledge to refine CBMs, better aligning them with clinicians' decision-making processes.
We validate our approach on two datasets of medical images: white blood cell and skin images.
arXiv Detail & Related papers (2024-07-09T07:03:42Z) - Detecting algorithmic bias in medical-AI models using trees [7.939586935057782]
This paper presents an innovative framework for detecting areas of algorithmic bias in medical-AI decision support systems.
Our approach efficiently identifies potential biases in medical-AI models, specifically in the context of sepsis prediction.
arXiv Detail & Related papers (2023-12-05T18:47:34Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - Improving Trustworthiness of AI Disease Severity Rating in Medical
Imaging with Ordinal Conformal Prediction Sets [0.7734726150561088]
A lack of statistically rigorous uncertainty quantification is a significant factor undermining trust in AI results.
Recent developments in distribution-free uncertainty quantification present practical solutions for these issues.
We demonstrate a technique for forming ordinal prediction sets that are guaranteed to contain the correct stenosis severity.
arXiv Detail & Related papers (2022-07-05T18:01:20Z) - Literature-Augmented Clinical Outcome Prediction [10.46990394710927]
We introduce techniques to help bridge this gap between EBM and AI-based clinical models.
We propose a novel system that automatically retrieves patient-specific literature based on intensive care (ICU) patient information.
Our model is able to substantially boost predictive accuracy on three challenging tasks in comparison to strong recent baselines.
arXiv Detail & Related papers (2021-11-16T11:19:02Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Uncertainty estimation for classification and risk prediction on medical
tabular data [0.0]
This work advances the understanding of uncertainty estimation for classification and risk prediction on medical data.
In a data-scarce field such as healthcare, the ability to measure the uncertainty of a model's prediction could potentially lead to improved effectiveness of decision support tools.
arXiv Detail & Related papers (2020-04-13T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.