Related papers: Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation

URL: http://arxiv.org/abs/2508.05008v1
Date: Thu, 07 Aug 2025 03:41:41 GMT
Title: Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation
Authors: Xusheng Liang, Lihua Zhou, Nianxin Li, Miao Xu, Ziyang Song, Dong Yi, Jinlin Wu, Hongbin Liu, Jiebo Luo, Zhen Lei,
Abstract summary: We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
Score: 56.52520416420957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language Models (VLMs), such as CLIP, have demonstrated remarkable zero-shot capabilities in various computer vision tasks. However, their application to medical imaging remains challenging due to the high variability and complexity of medical data. Specifically, medical images often exhibit significant domain shifts caused by various confounders, including equipment differences, procedure artifacts, and imaging modes, which can lead to poor generalization when models are applied to unseen domains. To address this limitation, we propose Multimodal Causal-Driven Representation Learning (MCDRL), a novel framework that integrates causal inference with the VLM to tackle domain generalization in medical image segmentation. MCDRL is implemented in two steps: first, it leverages CLIP's cross-modal capabilities to identify candidate lesion regions and construct a confounder dictionary through text prompts, specifically designed to represent domain-specific variations; second, it trains a causal intervention network that utilizes this dictionary to identify and eliminate the influence of these domain-specific variations while preserving the anatomical structural information critical for segmentation tasks. Extensive experiments demonstrate that MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.

Related papers

MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network [51.68708264694361]
Confusion factors can affect medical images, such as complex anatomical variations and imaging modality limitations.<n>We propose a multi-causal aware modeling backdoor-intervention optimization network for medical image segmentation.<n>Our method significantly reduces the influence of confusion factors, leading to enhanced segmentation accuracy.
arXiv Detail & Related papers (2025-05-28T01:40:10Z)
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation [4.760537994346813]
Medical image reporting aims to generate structured clinical descriptions from radiological images.<n>We propose MicarVLMoE, a vision-language mixture-of-experts model with gated cross-aligned fusion.<n>We extend MIR to CT scans, retinal imaging, MRI scans, and gross pathology images, reporting state-of-the-art results.
arXiv Detail & Related papers (2025-04-29T01:26:02Z)
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging [4.341503087761129]
Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming.<n>Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues.
arXiv Detail & Related papers (2025-04-09T23:33:35Z)
Generalizable Single-Source Cross-modality Medical Image Segmentation via Invariant Causal Mechanisms [16.699205051836657]
Single-source domain generalization aims to learn a model from a single source domain that can generalize well on unseen target domains. This is an important task in computer vision, particularly relevant to medical imaging where domain shifts are common. We combine causality-inspired theoretical insights on learning domain-invariant representations with recent advancements in diffusion-based augmentation to improve generalization across diverse imaging modalities.
arXiv Detail & Related papers (2024-11-07T22:35:17Z)
Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation? [10.20366295974822]
We introduce a novel decode head architecture, HQHSAM, which simply integrates elements from two state-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation performance. Our experiments on multiple datasets, encompassing various anatomies and modalities, reveal that FMs, particularly with the HQHSAM decode head, improve domain generalization for medical image segmentation.
arXiv Detail & Related papers (2024-09-12T11:41:35Z)
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention [1.1155836879100416]
We propose a Modality-agnostic Domain Generalizable Network (MADGNet) for medical image segmentation. MFMSA block refines the process of spatial feature extraction, particularly in capturing boundary features. E-SDM mitigates information loss in multi-task learning with deep supervision.
arXiv Detail & Related papers (2024-05-10T07:34:36Z)
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge. This variability directly impacts the development and evaluation of automated segmentation algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z)
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT) C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z)
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images. Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z)
Cross-Modality Brain Tumor Segmentation via Bidirectional Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme. Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor. The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.