The METRIC-framework for assessing data quality for trustworthy AI in
medicine: a systematic review
- URL: http://arxiv.org/abs/2402.13635v1
- Date: Wed, 21 Feb 2024 09:15:46 GMT
- Title: The METRIC-framework for assessing data quality for trustworthy AI in
medicine: a systematic review
- Authors: Daniel Schwabe, Katinka Becker, Martin Seyferth, Andreas Kla{\ss},
Tobias Sch\"affter
- Abstract summary: Development of trustworthy AI is especially important in medicine.
We focus on the importance of data quality (training/test) in deep learning (DL)
We propose the METRIC-framework, a specialised data quality framework for medical training data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The adoption of machine learning (ML) and, more specifically, deep learning
(DL) applications into all major areas of our lives is underway. The
development of trustworthy AI is especially important in medicine due to the
large implications for patients' lives. While trustworthiness concerns various
aspects including ethical, technical and privacy requirements, we focus on the
importance of data quality (training/test) in DL. Since data quality dictates
the behaviour of ML products, evaluating data quality will play a key part in
the regulatory approval of medical AI products. We perform a systematic review
following PRISMA guidelines using the databases PubMed and ACM Digital Library.
We identify 2362 studies, out of which 62 records fulfil our eligibility
criteria. From this literature, we synthesise the existing knowledge on data
quality frameworks and combine it with the perspective of ML applications in
medicine. As a result, we propose the METRIC-framework, a specialised data
quality framework for medical training data comprising 15 awareness dimensions,
along which developers of medical ML applications should investigate a dataset.
This knowledge helps to reduce biases as a major source of unfairness, increase
robustness, facilitate interpretability and thus lays the foundation for
trustworthy AI in medicine. Incorporating such systematic assessment of medical
datasets into regulatory approval processes has the potential to accelerate the
approval of ML products and builds the basis for new standards.
Related papers
- Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue.
SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.
We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - Scorecards for Synthetic Medical Data Evaluation and Reporting [2.8262986891348056]
The growing utilization of synthetic medical data (SMD) in training and testing AI-driven tools in healthcare requires a systematic framework for assessing its quality.
Here, we outline an evaluation framework designed to meet the unique requirements of medical applications.
We introduce the concept of scorecards, which can serve as comprehensive reports that accompany artificially generated datasets.
arXiv Detail & Related papers (2024-06-17T02:11:59Z) - A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry [2.1717945745027425]
Large Language Models (LLMs) have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation.
This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare.
Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness.
arXiv Detail & Related papers (2024-04-24T09:55:24Z) - Large Language Models for Biomedical Knowledge Graph Construction:
Information extraction from EMR notes [0.0]
We propose an end-to-end machine learning solution based on large language models (LLMs)
The entities used in the KG construction process are diseases, factors, treatments, as well as manifestations that coexist with the patient while experiencing the disease.
The application of the proposed methodology is demonstrated on age-related macular degeneration.
arXiv Detail & Related papers (2023-01-29T15:52:33Z) - Benchmark datasets driving artificial intelligence development fail to
capture the needs of medical professionals [4.799783526620609]
We released a catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP)
A total of 450 NLP datasets were manually systematized and annotated with rich metadata.
Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed.
arXiv Detail & Related papers (2022-01-18T15:05:28Z) - MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence
using Federated Evaluation [110.31526448744096]
We argue that unlocking this potential requires a systematic way to measure the performance of medical AI models on large-scale heterogeneous data.
We are building MedPerf, an open framework for benchmarking machine learning in the medical domain.
arXiv Detail & Related papers (2021-09-29T18:09:41Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.