Requirements for Quality Assurance of AI Models for Early Detection of Lung Cancer
- URL: http://arxiv.org/abs/2502.17639v1
- Date: Mon, 24 Feb 2025 20:38:29 GMT
- Title: Requirements for Quality Assurance of AI Models for Early Detection of Lung Cancer
- Authors: Horst K. Hahn, Matthias S. May, Volker Dicken, Michael Walz, Rainer Eßeling, Bianca Lassen-Schmidt, Robert Rischen, Jens Vogel-Claussen, Konstantin Nikolaou, Jörg Barkhausen,
- Abstract summary: Lung cancer is the second most common cancer and the leading cause of cancer-related deaths worldwide.<n>Under the EU AI Act, consistent quality assurance is required for AI-based nodule detection, measurement, and characterization.<n>This position paper proposes systematic quality assurance grounded in a validated reference dataset.
- Score: 0.5801420352256208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lung cancer is the second most common cancer and the leading cause of cancer-related deaths worldwide. Survival largely depends on tumor stage at diagnosis, and early detection with low-dose CT can significantly reduce mortality in high-risk patients. AI can improve the detection, measurement, and characterization of pulmonary nodules while reducing assessment time. However, the training data, functionality, and performance of available AI systems vary considerably, complicating software selection and regulatory evaluation. Manufacturers must specify intended use and provide test statistics, but they can choose their training and test data, limiting standardization and comparability. Under the EU AI Act, consistent quality assurance is required for AI-based nodule detection, measurement, and characterization. This position paper proposes systematic quality assurance grounded in a validated reference dataset, including real screening cases plus phantom data to verify volume and growth rate measurements. Regular updates shall reflect demographic shifts and technological advances, ensuring ongoing relevance. Consequently, ongoing AI quality assurance is vital. Regulatory challenges are also adressed. While the MDR and the EU AI Act set baseline requirements, they do not adequately address self-learning algorithms or their updates. A standardized, transparent quality assessment - based on sensitivity, specificity, and volumetric accuracy - enables an objective evaluation of each AI solution's strengths and weaknesses. Establishing clear testing criteria and systematically using updated reference data lay the groundwork for comparable performance metrics, informing tenders, guidelines, and recommendations.
Related papers
- Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond [52.246494389096654]
This paper introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at both the word and sequence levels.
We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs)
arXiv Detail & Related papers (2024-02-22T03:46:08Z) - Can I trust my fake data -- A comprehensive quality assessment framework
for synthetic tabular data in healthcare [33.855237079128955]
In response to privacy concerns and regulatory requirements, using synthetic data has been suggested.
We present a conceptual framework for quality assurance of SD for AI applications in healthcare.
We propose stages necessary to support real-life applications.
arXiv Detail & Related papers (2024-01-24T08:14:20Z) - RAISE -- Radiology AI Safety, an End-to-end lifecycle approach [5.829180249228172]
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency.
The focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy.
The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
arXiv Detail & Related papers (2023-11-24T15:59:14Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Explainable AI for Malnutrition Risk Prediction from m-Health and
Clinical Data [3.093890460224435]
This paper presents a novel AI framework for early and explainable malnutrition risk detection based on heterogeneous m-health data.
We performed an extensive model evaluation including both subject-independent and personalised predictions.
We also investigated several benchmark XAI methods to extract global model explanations.
arXiv Detail & Related papers (2023-05-31T08:07:35Z) - Modeling Disagreement in Automatic Data Labelling for Semi-Supervised
Learning in Clinical Natural Language Processing [2.016042047576802]
We investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports.
arXiv Detail & Related papers (2022-05-29T20:20:49Z) - Detecting Spurious Correlations with Sanity Tests for Artificial
Intelligence Guided Radiology Systems [22.249702822013045]
A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety.
The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset.
We describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons.
arXiv Detail & Related papers (2021-03-04T14:14:05Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.