Robust Multimodal Representation Learning in Healthcare
- URL: http://arxiv.org/abs/2601.21941v1
- Date: Thu, 29 Jan 2026 16:27:54 GMT
- Title: Robust Multimodal Representation Learning in Healthcare
- Authors: Xiaoguang Zhu, Linxiao Gong, Lianlong Sun, Yang Liu, Haoyu Wang, Jing Liu,
- Abstract summary: Real-world medical datasets commonly contain systematic biases from multiple sources.<n>We propose a Dual-Stream Feature Decorrelation Framework that identifies and handles the biases.<n>Our method employs a causal-biased decorrelation framework with dual-stream neural networks to disentangle causal features from spurious correlations.
- Score: 12.190907451083765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical multimodal representation learning aims to integrate heterogeneous data into unified patient representations to support clinical outcome prediction. However, real-world medical datasets commonly contain systematic biases from multiple sources, which poses significant challenges for medical multimodal representation learning. Existing approaches typically focus on effective multimodal fusion, neglecting inherent biased features that affect the generalization ability. To address these challenges, we propose a Dual-Stream Feature Decorrelation Framework that identifies and handles the biases through structural causal analysis introduced by latent confounders. Our method employs a causal-biased decorrelation framework with dual-stream neural networks to disentangle causal features from spurious correlations, utilizing generalized cross-entropy loss and mutual information minimization for effective decorrelation. The framework is model-agnostic and can be integrated into existing medical multimodal learning methods. Comprehensive experiments on MIMIC-IV, eICU, and ADNI datasets demonstrate consistent performance improvements.
Related papers
- Revealing Multimodal Causality with Large Language Models [80.95511545591107]
We propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data.<n>It consists of three key components: (1) a novel contrastive factor discovery module to identify genuine multimodal factors; (2) a statistical causal structure discovery module to infer causal relationships among discovered factors; and (3) an iterative multimodal counterfactual reasoning module to refine the discovery outcomes.<n>Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLLM-CD.
arXiv Detail & Related papers (2025-09-22T13:45:17Z) - Causal Debiasing Medical Multimodal Representation Learning with Missing Modalities [6.02318066285653]
Real-world medical datasets often suffer from missing modalities due to cost, protocol, or patient-specific constraints.<n>Our method consists of two key components: (1) a missingness deconfounding module that approximates causal intervention based on backdoor adjustment and (2) a dual-branch neural network that explicitly disentangles causal features from spurious correlations.
arXiv Detail & Related papers (2025-09-06T06:27:10Z) - Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis [16.95583564875497]
We propose an Incomplete Modality Disentangled Representation (IMDR) strategy to disentangle features into explicit independent modal-common and modal-specific features.<n> Experiments on four multimodal datasets demonstrate that the proposed IMDR outperforms the state-of-the-art methods significantly.
arXiv Detail & Related papers (2025-02-17T12:10:35Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems.
Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems.
We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z) - Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives [0.3749861135832073]
This research presents a novel multimodal data fusion methodology for pain behavior recognition.
We introduce two key innovations: 1) integrating data-driven statistical relevance weights into the fusion strategy, and 2) incorporating human-centric movement characteristics into multimodal representation learning.
Our findings have significant implications for promoting patient-centered healthcare interventions and supporting explainable clinical decision-making.
arXiv Detail & Related papers (2024-03-30T11:13:18Z) - Contrastive Learning on Multimodal Analysis of Electronic Health Records [15.392566551086782]
We propose a novel feature embedding generative model and design a multimodal contrastive loss to obtain the multimodal EHR feature representation.<n>Our theoretical analysis demonstrates the effectiveness of multimodal learning compared to single-modality learning.<n>This connection paves the way for a privacy-preserving algorithm tailored for multimodal EHR feature representation learning.
arXiv Detail & Related papers (2024-03-22T03:01:42Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Multi-modal Graph Learning for Disease Prediction [35.4310911850558]
We propose an end-to-end Multimodal Graph Learning framework (MMGL) for disease prediction.
Instead of defining the adjacency matrix manually as existing methods, the latent graph structure can be captured through a novel way of adaptive graph learning.
arXiv Detail & Related papers (2021-07-01T03:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.