Related papers: Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications

Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications

URL: http://arxiv.org/abs/2411.03782v1
Date: Wed, 06 Nov 2024 09:18:05 GMT
Title: Navigating the landscape of multimodal AI in medicine: a scoping review on technical challenges and clinical applications
Authors: Daan Schouten, Giulia Nicoletti, Bas Dille, Catherine Chia, Pierpaolo Vendittelli, Megan Schuurmans, Geert Litjens, Nadieh Khalili,
Abstract summary: This review examines the landscape of deep learning-based multimodal AI applications across the medical domain. multimodal AI models consistently outperform their unimodal counterparts, with an average improvement of 6.2 percentage points in AUC. We identify key factors driving multimodal AI development and propose recommendations to accelerate the field's maturation.
Score: 2.3754862363513523
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent technological advances in healthcare have led to unprecedented growth in patient data quantity and diversity. While artificial intelligence (AI) models have shown promising results in analyzing individual data modalities, there is increasing recognition that models integrating multiple complementary data sources, so-called multimodal AI, could enhance clinical decision-making. This scoping review examines the landscape of deep learning-based multimodal AI applications across the medical domain, analyzing 432 papers published between 2018 and 2024. We provide an extensive overview of multimodal AI development across different medical disciplines, examining various architectural approaches, fusion strategies, and common application areas. Our analysis reveals that multimodal AI models consistently outperform their unimodal counterparts, with an average improvement of 6.2 percentage points in AUC. However, several challenges persist, including cross-departmental coordination, heterogeneous data characteristics, and incomplete datasets. We critically assess the technical and practical challenges in developing multimodal AI systems and discuss potential strategies for their clinical implementation, including a brief overview of commercially available multimodal AI models for clinical decision-making. Additionally, we identify key factors driving multimodal AI development and propose recommendations to accelerate the field's maturation. This review provides researchers and clinicians with a thorough understanding of the current state, challenges, and future directions of multimodal AI in medicine.

Related papers

Towards deployment-centric multimodal AI beyond vision and language [67.02589156099391]
We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions. We identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
arXiv Detail & Related papers (2025-04-04T17:20:05Z)
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine [40.23383597339471]
multimodal AI is capable of integrating diverse data modalities, including imaging, text, and structured data, within a single model. This scoping review explores the evolution of multimodal AI, highlighting its methods, applications, datasets, and evaluation in clinical settings. Our findings underscore a shift from unimodal to multimodal approaches, driving innovations in diagnostic support, medical report generation, drug discovery, and conversational AI.
arXiv Detail & Related papers (2025-02-13T11:57:51Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Explainable Artificial Intelligence for Medical Applications: A Review [42.33274794442013]
This article reviews recent research grounded in explainable artificial intelligence (XAI) It focuses on medical practices within the visual, audio, and multimodal perspectives. We endeavour to categorise and synthesise these practices, aiming to provide support and guidance for future researchers and healthcare professionals.
arXiv Detail & Related papers (2024-11-15T11:31:06Z)
Artificial intelligence techniques in inherited retinal diseases: A review [19.107474958408847]
Inherited retinal diseases (IRDs) are a diverse group of genetic disorders that lead to progressive vision loss and are a major cause of blindness in working-age adults. Recent advancements in artificial intelligence (AI) offer promising solutions to these challenges. This review consolidates existing studies, identifies gaps, and provides an overview of AI's potential in diagnosing and managing IRDs.
arXiv Detail & Related papers (2024-10-10T03:14:51Z)
The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution. Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models. This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z)
Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning. AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv Detail & Related papers (2024-07-25T17:46:38Z)
TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design. We provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z)
A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring. As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs. The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z)
Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects [2.1070612998322438]
The paper explores the transformative potential of multimodal models for clinical predictions. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist.
arXiv Detail & Related papers (2023-11-04T05:42:51Z)
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z)
Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z)
Artificial Intelligence-Based Methods for Fusion of Electronic Health Records and Imaging Data [0.9749560288448113]
We focus on synthesizing and analyzing the literature that uses AI techniques to fuse multimodal medical data for different clinical applications. We present a comprehensive analysis of the various fusion strategies, the diseases and clinical outcomes for which multimodal fusion was used, and the available multimodal medical datasets.
arXiv Detail & Related papers (2022-10-23T07:13:37Z)
Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review [8.014632186417423]
The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data produced during routine practice. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This review will include the (1) overview of current multi-modal learning, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future
arXiv Detail & Related papers (2022-03-25T18:50:03Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.