Recommendations on test datasets for evaluating AI solutions in
pathology
- URL: http://arxiv.org/abs/2204.14226v1
- Date: Thu, 21 Apr 2022 14:52:47 GMT
- Title: Recommendations on test datasets for evaluating AI solutions in
pathology
- Authors: Andr\'e Homeyer, Christian Gei{\ss}ler, Lars Ole Schwen, Falk
Zakrzewski, Theodore Evans, Klaus Strohmenger, Max Westphal, Roman David
B\"ulow, Michaela Kargl, Aray Karjauv, Isidre Munn\'e-Bertran, Carl Orge
Retzlaff, Adri\`a Romero-L\'opez, Tomasz So{\l}tysi\'nski, Markus Plass, Rita
Carvalho, Peter Steinbach, Yu-Chia Lan, Nassim Bouteldja, David Haber, Mateo
Rojas-Carulla, Alireza Vafaei Sadr, Matthias Kraft, Daniel Kr\"uger, Rutger
Fick, Tobias Lang, Peter Boor, Heimo M\"uller, Peter Hufnagl, Norman Zerbe
- Abstract summary: AI solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis.
Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval.
A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology.
- Score: 2.001521933638504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) solutions that automatically extract information
from digital histology images have shown great promise for improving
pathological diagnosis. Prior to routine use, it is important to evaluate their
predictive performance and obtain regulatory approval. This assessment requires
appropriate test datasets. However, compiling such datasets is challenging and
specific recommendations are missing.
A committee of various stakeholders, including commercial AI developers,
pathologists, and researchers, discussed key aspects and conducted extensive
literature reviews on test datasets in pathology. Here, we summarize the
results and derive general recommendations for the collection of test datasets.
We address several questions: Which and how many images are needed? How to
deal with low-prevalence subsets? How can potential bias be detected? How
should datasets be reported? What are the regulatory requirements in different
countries?
The recommendations are intended to help AI developers demonstrate the
utility of their products and to help regulatory agencies and end users verify
reported performance measures. Further research is needed to formulate criteria
for sufficiently representative test datasets so that AI solutions can operate
with less user intervention and better support diagnostic workflows in the
future.
Related papers
- TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring.
As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs.
The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z) - Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology [35.284458448940796]
Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication.
Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images.
We present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders.
arXiv Detail & Related papers (2024-05-08T14:16:22Z) - BESTMVQA: A Benchmark Evaluation System for Medical Visual Question
Answering [8.547600133510551]
This paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA.
Our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem.
With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset.
arXiv Detail & Related papers (2023-12-13T03:08:48Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Weakly Supervised Anomaly Detection: A Survey [75.26180038443462]
Anomaly detection (AD) is a crucial task in machine learning with various applications.
We present the first comprehensive survey of weakly supervised anomaly detection (WSAD) methods.
For each setting, we provide formal definitions, key algorithms, and potential future directions.
arXiv Detail & Related papers (2023-02-09T10:27:21Z) - Benchmark datasets driving artificial intelligence development fail to
capture the needs of medical professionals [4.799783526620609]
We released a catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP)
A total of 450 NLP datasets were manually systematized and annotated with rich metadata.
Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed.
arXiv Detail & Related papers (2022-01-18T15:05:28Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Peri-Diagnostic Decision Support Through Cost-Efficient Feature
Acquisition at Test-Time [37.160335232396406]
A sub-problem in CADx is to guide the physician during the entire peri-diagnostic workflow, including the acquisition stage.
We propose a novel approach which is enticingly simple: use dropout at the input layer, and integrated gradients of the trained network at test-time to attribute feature importance dynamically.
Results show that our proposed approach is more cost- and feature-efficient than prior approaches and achieves a higher overall accuracy.
arXiv Detail & Related papers (2020-03-31T12:00:44Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.