Related papers: AdaFuse: Adaptive Multimodal Fusion for Lung Cancer Risk Prediction via Reinforcement Learning

AdaFuse: Adaptive Multimodal Fusion for Lung Cancer Risk Prediction via Reinforcement Learning

URL: http://arxiv.org/abs/2602.00347v1
Date: Fri, 30 Jan 2026 21:51:44 GMT
Title: AdaFuse: Adaptive Multimodal Fusion for Lung Cancer Risk Prediction via Reinforcement Learning
Authors: Chongyu Qu, Zhengyi Lu, Yuxiang Lai, Thomas Z. Li, Junchao Zhu, Junlin Guo, Juming Xiong, Yanfan Zhu, Yuechen Yang, Allen J. Luna, Kim L. Sandler, Bennett A. Landman, Yuankai Huo,
Abstract summary: We present AdaFuse, an adaptive multimodal fusion framework that learns patient-specific modality selection and fusion strategies for lung cancer risk prediction.<n>AdaFuse formulates multimodal fusion as a sequential decision process, where the policy network iteratively decides whether to incorporate an additional modality or proceed to prediction based on the information already acquired.
Score: 5.519724506536996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal fusion has emerged as a promising paradigm for disease diagnosis and prognosis, integrating complementary information from heterogeneous data sources such as medical images, clinical records, and radiology reports. However, existing fusion methods process all available modalities through the network, either treating them equally or learning to assign different contribution weights, leaving a fundamental question unaddressed: for a given patient, should certain modalities be used at all? We present AdaFuse, an adaptive multimodal fusion framework that leverages reinforcement learning (RL) to learn patient-specific modality selection and fusion strategies for lung cancer risk prediction. AdaFuse formulates multimodal fusion as a sequential decision process, where the policy network iteratively decides whether to incorporate an additional modality or proceed to prediction based on the information already acquired. This sequential formulation enables the model to condition each selection on previously observed modalities and terminate early when sufficient information is available, rather than committing to a fixed subset upfront. We evaluate AdaFuse on the National Lung Screening Trial (NLST) dataset. Experimental results demonstrate that AdaFuse achieves the highest AUC (0.762) compared to the best single-modality baseline (0.732), the best fixed fusion strategy (0.759), and adaptive baselines including DynMM (0.754) and MoE (0.742), while using fewer FLOPs than all triple-modality methods. Our work demonstrates the potential of reinforcement learning for personalized multimodal fusion in medical imaging, representing a shift from uniform fusion strategies toward adaptive diagnostic pipelines that learn when to consult additional modalities and when existing information suffices for accurate prediction.

Related papers

A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer [28.475952006436227]
We present a missing-aware multimodal survival framework that integrates Computed Tomography (CT), Whole-Slide Histopathology (WSI) Images, and structured clinical variables for overall survival modeling in unresectable stage II-III NSCLC.<n> Experimental results demonstrate that intermediate fusion consistently outperforms unimodal baselines as well as early and late fusion strategies.
arXiv Detail & Related papers (2026-01-15T13:38:19Z)
impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z)
AdaFusion: Prompt-Guided Inference with Adaptive Fusion of Pathology Foundation Models [49.550545038402184]
We propose AdaFusion, a novel prompt-guided inference framework.<n>Our method compresses and aligns tile-level features from diverse models.<n>AdaFusion consistently surpasses individual PFMs across both classification and regression tasks.
arXiv Detail & Related papers (2025-08-07T07:09:31Z)
Latent Space Data Fusion Outperforms Early Fusion in Multimodal Mental Health Digital Phenotyping Data [0.0]
Mental illnesses such as depression and anxiety require improved methods for early detection and personalized intervention.<n>Traditional predictive models often rely on unimodal data or early fusion strategies that fail to capture the complex, multimodal nature of psychiatric data.<n>We evaluated intermediate (latent space) fusion for predicting daily depressive symptoms.
arXiv Detail & Related papers (2025-07-10T18:10:46Z)
GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning [60.03671205298294]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability.<n>This work first proposes a Region-Aware Multimodal Chain-of-Thought dataset, in which the process of producing an answer is preceded by a sequence of intermediate reasoning steps.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data. We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions. Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z)
Cross-modality Attention-based Multimodal Fusion for Non-small Cell Lung Cancer (NSCLC) Patient Survival Prediction [0.6476298550949928]
We propose a cross-modality attention-based multimodal fusion pipeline designed to integrate modality-specific knowledge for patient survival prediction in non-small cell lung cancer (NSCLC) Compared with single modality, which achieved c-index of 0.5772 and 0.5885 using solely tissue image data or RNA-seq data, respectively, the proposed fusion approach achieved c-index 0.6587 in our experiment.
arXiv Detail & Related papers (2023-08-18T21:42:52Z)
RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy [1.8161758803237067]
We develop a multimodal late fusion approach to predict radiation therapy outcomes for non-small-cell lung cancer patients. Experiments show that the proposed multimodal paradigm with an AUC equal to $90.9%$ outperforms each unimodal approach.
arXiv Detail & Related papers (2022-04-26T16:32:52Z)
Deep Orthogonal Fusion: Multimodal Prognostic Biomarker Discovery Integrating Radiology, Pathology, Genomic, and Clinical Data [0.32622301272834525]
We predict the overall survival (OS) of glioma patients from diverse multimodal data with a Deep Orthogonal Fusion model. The model learns to combine information from MRI exams, biopsy-based modalities, and clinical variables into a comprehensive multimodal risk score. It significantly stratifies glioma patients by OS within clinical subsets, adding further granularity to prognostic clinical grading and molecular subtyping.
arXiv Detail & Related papers (2021-07-01T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.