From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
- URL: http://arxiv.org/abs/2502.09242v1
- Date: Thu, 13 Feb 2025 11:57:51 GMT
- Title: From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
- Authors: Lukas Buess, Matthias Keicher, Nassir Navab, Andreas Maier, Soroosh Tayebi Arasteh,
- Abstract summary: multimodal AI is capable of integrating diverse data modalities, including imaging, text, and structured data, within a single model.
This scoping review explores the evolution of multimodal AI, highlighting its methods, applications, datasets, and evaluation in clinical settings.
Our findings underscore a shift from unimodal to multimodal approaches, driving innovations in diagnostic support, medical report generation, drug discovery, and conversational AI.
- Score: 40.23383597339471
- License:
- Abstract: Generative artificial intelligence (AI) models, such as diffusion models and OpenAI's ChatGPT, are transforming medicine by enhancing diagnostic accuracy and automating clinical workflows. The field has advanced rapidly, evolving from text-only large language models for tasks such as clinical documentation and decision support to multimodal AI systems capable of integrating diverse data modalities, including imaging, text, and structured data, within a single model. The diverse landscape of these technologies, along with rising interest, highlights the need for a comprehensive review of their applications and potential. This scoping review explores the evolution of multimodal AI, highlighting its methods, applications, datasets, and evaluation in clinical settings. Adhering to PRISMA-ScR guidelines, we systematically queried PubMed, IEEE Xplore, and Web of Science, prioritizing recent studies published up to the end of 2024. After rigorous screening, 144 papers were included, revealing key trends and challenges in this dynamic field. Our findings underscore a shift from unimodal to multimodal approaches, driving innovations in diagnostic support, medical report generation, drug discovery, and conversational AI. However, critical challenges remain, including the integration of heterogeneous data types, improving model interpretability, addressing ethical concerns, and validating AI systems in real-world clinical settings. This review summarizes the current state of the art, identifies critical gaps, and provides insights to guide the development of scalable, trustworthy, and clinically impactful multimodal AI solutions in healthcare.
Related papers
- A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data.
Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied.
We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z) - Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications [2.3754862363513523]
This review examines the landscape of deep learning-based multimodal AI applications across the medical domain.
multimodal AI models consistently outperform their unimodal counterparts, with an average improvement of 6.2 percentage points in AUC.
We identify key factors driving multimodal AI development and propose recommendations to accelerate the field's maturation.
arXiv Detail & Related papers (2024-11-06T09:18:05Z) - The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution.
Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models.
This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z) - Practical Applications of Advanced Cloud Services and Generative AI Systems in Medical Image Analysis [17.4235794108467]
The article explores the transformative potential of generative AI in medical imaging, emphasizing its ability to generate syntheticACM-2 data.
By addressing limitations in dataset size and diversity, these models contribute to more accurate diagnoses and improved patient outcomes.
arXiv Detail & Related papers (2024-03-26T09:55:49Z) - OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models
in Medicine [55.29668193415034]
We present OpenMEDLab, an open-source platform for multi-modality foundation models.
It encapsulates solutions of pioneering attempts in prompting and fine-tuning large language and vision models for frontline clinical and bioinformatic applications.
It opens access to a group of pre-trained foundation models for various medical image modalities, clinical text, protein engineering, etc.
arXiv Detail & Related papers (2024-02-28T03:51:02Z) - Multimodal Machine Learning in Image-Based and Clinical Biomedicine:
Survey and Prospects [2.1070612998322438]
The paper explores the transformative potential of multimodal models for clinical predictions.
Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist.
arXiv Detail & Related papers (2023-11-04T05:42:51Z) - A Comprehensive Review of Generative AI in Healthcare [0.0]
generative AI models, specifically transformers and diffusion models, have played a crucial role in analyzing diverse forms of data.
These models have played a crucial role in analyzing diverse forms of data, including medical imaging, protein structure prediction, clinical documentation, diagnostic assistance, radiology interpretation, clinical decision support, medical coding, and billing.
This review paper aims to offer a thorough overview of the generative AI applications in healthcare, focusing on transformers and diffusion models.
arXiv Detail & Related papers (2023-10-01T21:13:14Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.