Recommendations on test datasets for evaluating AI solutions in
pathology
- URL: http://arxiv.org/abs/2204.14226v1
- Date: Thu, 21 Apr 2022 14:52:47 GMT
- Title: Recommendations on test datasets for evaluating AI solutions in
pathology
- Authors: Andr\'e Homeyer, Christian Gei{\ss}ler, Lars Ole Schwen, Falk
Zakrzewski, Theodore Evans, Klaus Strohmenger, Max Westphal, Roman David
B\"ulow, Michaela Kargl, Aray Karjauv, Isidre Munn\'e-Bertran, Carl Orge
Retzlaff, Adri\`a Romero-L\'opez, Tomasz So{\l}tysi\'nski, Markus Plass, Rita
Carvalho, Peter Steinbach, Yu-Chia Lan, Nassim Bouteldja, David Haber, Mateo
Rojas-Carulla, Alireza Vafaei Sadr, Matthias Kraft, Daniel Kr\"uger, Rutger
Fick, Tobias Lang, Peter Boor, Heimo M\"uller, Peter Hufnagl, Norman Zerbe
- Abstract summary: AI solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis.
Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval.
A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology.
- Score: 2.001521933638504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) solutions that automatically extract information
from digital histology images have shown great promise for improving
pathological diagnosis. Prior to routine use, it is important to evaluate their
predictive performance and obtain regulatory approval. This assessment requires
appropriate test datasets. However, compiling such datasets is challenging and
specific recommendations are missing.
A committee of various stakeholders, including commercial AI developers,
pathologists, and researchers, discussed key aspects and conducted extensive
literature reviews on test datasets in pathology. Here, we summarize the
results and derive general recommendations for the collection of test datasets.
We address several questions: Which and how many images are needed? How to
deal with low-prevalence subsets? How can potential bias be detected? How
should datasets be reported? What are the regulatory requirements in different
countries?
The recommendations are intended to help AI developers demonstrate the
utility of their products and to help regulatory agencies and end users verify
reported performance measures. Further research is needed to formulate criteria
for sufficiently representative test datasets so that AI solutions can operate
with less user intervention and better support diagnostic workflows in the
future.
Related papers
- AI in radiological imaging of soft-tissue and bone tumours: a systematic review evaluating against CLAIM and FUTURE-AI guidelines [1.5332408886895255]
Soft-tissue and bone tumours (STBT) are rare, diagnostically challenging lesions with variable clinical behaviours and treatment approaches.
This systematic review provides an overview of Artificial Intelligence (AI) methods using radiological imaging for diagnosis and prognosis of these tumours.
arXiv Detail & Related papers (2024-08-22T15:31:48Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring.
As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs.
The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z) - Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology [35.284458448940796]
Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication.
Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images.
We present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders.
arXiv Detail & Related papers (2024-05-08T14:16:22Z) - BESTMVQA: A Benchmark Evaluation System for Medical Visual Question
Answering [8.547600133510551]
This paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA.
Our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem.
With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset.
arXiv Detail & Related papers (2023-12-13T03:08:48Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Weakly Supervised Anomaly Detection: A Survey [75.26180038443462]
Anomaly detection (AD) is a crucial task in machine learning with various applications.
We present the first comprehensive survey of weakly supervised anomaly detection (WSAD) methods.
For each setting, we provide formal definitions, key algorithms, and potential future directions.
arXiv Detail & Related papers (2023-02-09T10:27:21Z) - Benchmark datasets driving artificial intelligence development fail to
capture the needs of medical professionals [4.799783526620609]
We released a catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP)
A total of 450 NLP datasets were manually systematized and annotated with rich metadata.
Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed.
arXiv Detail & Related papers (2022-01-18T15:05:28Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.