Exploring Patient Data Requirements in Training Effective AI Models for MRI-based Breast Cancer Classification
- URL: http://arxiv.org/abs/2502.18506v1
- Date: Sat, 22 Feb 2025 04:04:52 GMT
- Title: Exploring Patient Data Requirements in Training Effective AI Models for MRI-based Breast Cancer Classification
- Authors: Solha Kang, Wesley De Neve, Francois Rameau, Utku Ozbulak,
- Abstract summary: We show that medical institutions do not need a decade's worth of MRI images to train an AI model.<n>We observe that for patient counts greater than 50, the number of patients in the training set has a negligible impact on the performance of models.
- Score: 1.6904868487513736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The past decade has witnessed a substantial increase in the number of startups and companies offering AI-based solutions for clinical decision support in medical institutions. However, the critical nature of medical decision-making raises several concerns about relying on external software. Key issues include potential variations in image modalities and the medical devices used to obtain these images, potential legal issues, and adversarial attacks. Fortunately, the open-source nature of machine learning research has made foundation models publicly available and straightforward to use for medical applications. This accessibility allows medical institutions to train their own AI-based models, thereby mitigating the aforementioned concerns. Given this context, an important question arises: how much data do medical institutions need to train effective AI models? In this study, we explore this question in relation to breast cancer detection, a particularly contested area due to the prevalence of this disease, which affects approximately 1 in every 8 women. Through large-scale experiments on various patient sizes in the training set, we show that medical institutions do not need a decade's worth of MRI images to train an AI model that performs competitively with the state-of-the-art, provided the model leverages foundation models. Furthermore, we observe that for patient counts greater than 50, the number of patients in the training set has a negligible impact on the performance of models and that simple ensembles further improve the results without additional complexity.
Related papers
- Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.
These models can be adapted to various applications such as question answering and visual understanding.
This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z) - LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models [18.6994780408699]
Large Language Models (LLMs) face significant challenges in medical question answering.
We propose a novel approach incorporating similar case generation within a multi-agent medical question-answering system.
Our method capitalizes on the model's inherent medical knowledge and reasoning capabilities, eliminating the need for additional training data.
arXiv Detail & Related papers (2024-12-31T19:55:45Z) - Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques [0.49000940389224884]
Machine learning has significantly advanced healthcare by aiding in disease prevention and treatment identification.
However, accessing patient data can be challenging due to privacy concerns and strict regulations.
Recent studies suggest that fine-tuning foundation models can produce such data effectively.
arXiv Detail & Related papers (2024-09-06T17:36:08Z) - The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution.
Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models.
This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - A Concept-based Interpretable Model for the Diagnosis of Choroid
Neoplasias using Multimodal Data [28.632437578685842]
We focus on choroid neoplasias, the most prevalent form of eye cancer in adults, albeit rare with 5.1 per million.
Our work introduces a concept-based interpretable model that distinguishes between three types of choroidal tumors, integrating insights from domain experts via radiological reports.
Remarkably, this model achieves an F1 score of 0.91, rivaling that of black-box models, but also boosts the diagnostic accuracy of junior doctors by 42%.
arXiv Detail & Related papers (2024-03-08T07:15:53Z) - Free Form Medical Visual Question Answering in Radiology [3.495246564946556]
Research in medical Visual Question Answering has been scant, only gaining momentum since 2018.
Our research delves into the effective representation of radiology images and the joint learning of multimodal representations.
Our model achieves a top-1 accuracy of 79.55% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T20:26:52Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - MedAlpaca -- An Open-Source Collection of Medical Conversational AI
Models and Training Data [40.97474177100237]
Large language models (LLMs) hold considerable promise for improving medical, diagnostics, patient care, and education.
Yet, there is an urgent need for open-source models that can be deployed on-premises to safeguard patient privacy.
We present an innovative dataset consisting of over 160,000 entries, specifically crafted to fine-tune LLMs for effective medical applications.
arXiv Detail & Related papers (2023-04-14T11:28:08Z) - Medical diffusion on a budget: Textual Inversion for medical image generation [3.0826983115939823]
Training from scratch requires large captioned datasets and significant computational resources.
This work shows that adapting pre-trained Stable Diffusion models to medical imaging modalities is achievable by training text embeddings.
The trained embeddings are compact (less than 1 MB), enabling easy data sharing with reduced privacy concerns.
arXiv Detail & Related papers (2023-03-23T16:50:19Z) - A Trustworthy Framework for Medical Image Analysis with Deep Learning [71.48204494889505]
TRUDLMIA is a trustworthy deep learning framework for medical image analysis.
It is anticipated that the framework will support researchers and clinicians in advancing the use of deep learning for dealing with public health crises including COVID-19.
arXiv Detail & Related papers (2022-12-06T05:30:22Z) - Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI.
We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.