Related papers: Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

URL: http://arxiv.org/abs/2411.06106v4
Date: Thu, 24 Jul 2025 03:38:33 GMT
Title: Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Authors: Zhaorui Tan, Xi Yang, Tan Pan, Tianyi Liu, Chen Jiang, Xin Guo, Qiufeng Wang, Anh Nguyen, Yuan Qi, Kaizhu Huang, Yuan Cheng,
Abstract summary: Existing methods often concentrate exclusively on common anatomical patterns, neglecting individual differences.<n>We propose a two-stage approach: pre-training with invariant representation $mathbbX_h$ for personalization, then fine-tuning for diverse downstream tasks.<n>Our approach yields greater generalizability and transferability across diverse multi-modal medical tasks compared to methods lacking personalization.
Score: 35.5423842780382
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Variations in medical imaging modalities and individual anatomical differences pose challenges to cross-modality generalization in multi-modal tasks. Existing methods often concentrate exclusively on common anatomical patterns, thereby neglecting individual differences and consequently limiting their generalization performance. This paper emphasizes the critical role of learning individual-level invariance, i.e., personalized representation $\mathbb{X}_h$, to enhance multi-modality generalization under both homogeneous and heterogeneous settings. It reveals that mappings from individual biological profile to different medical modalities remain static across the population, which is implied in the personalization process. We propose a two-stage approach: pre-training with invariant representation $\mathbb{X}_h$ for personalization, then fine-tuning for diverse downstream tasks. We provide both theoretical and empirical evidence demonstrating the feasibility and advantages of personalization, showing that our approach yields greater generalizability and transferability across diverse multi-modal medical tasks compared to methods lacking personalization. Extensive experiments further validate that our approach significantly enhances performance in various generalization scenarios.

Related papers

Robust Multimodal Representation Learning in Healthcare [12.190907451083765]
Real-world medical datasets commonly contain systematic biases from multiple sources.<n>We propose a Dual-Stream Feature Decorrelation Framework that identifies and handles the biases.<n>Our method employs a causal-biased decorrelation framework with dual-stream neural networks to disentangle causal features from spurious correlations.
arXiv Detail & Related papers (2026-01-29T16:27:54Z)
MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation [21.766481181140527]
We propose MedSAMix, a training-free model merging method for medical image segmentation.<n>We show that MedSAMix consistently improves performance in both domain-specific accuracy and generalization.<n>For clinical applications, we develop two regimes to meet the demand of domain-specificity and generalizability.
arXiv Detail & Related papers (2025-08-14T19:35:57Z)
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
Semantic Alignment of Unimodal Medical Text and Vision Representations [1.8848810602776873]
General-purpose AI models can exhibit similar latent spaces when processing semantically related data. We show how semantic alignment can bridge general-purpose AI with specialised medical knowledge. We introduce a novel zero-shot classification approach for unimodal vision encoders that leverages semantic alignment across modalities.
arXiv Detail & Related papers (2025-03-06T14:28:17Z)
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [52.106879463828044]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z)
Test-Time Modality Generalization for Medical Image Segmentation [0.9092907230570326]
Generalizable medical image segmentation is essential for ensuring consistent performance across diverse unseen clinical settings.<n>We introduce a novel Test-Time Modality Generalization (TTMG) framework, which comprises two core components: Modality-Aware Style Projection (MASP) and Modality-Sensitive Instance Whitening (MSIW)<n>MASP estimates the likelihood of a test instance belonging to each seen modality and maps it onto a distribution using modality-specific style bases, guiding its projection effectively.<n>MSIW is applied during training to selectively suppress modality-sensitive information while retaining modality-invariant features.
arXiv Detail & Related papers (2025-02-27T01:32:13Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Generalizable Single-Source Cross-modality Medical Image Segmentation via Invariant Causal Mechanisms [16.699205051836657]
Single-source domain generalization aims to learn a model from a single source domain that can generalize well on unseen target domains. This is an important task in computer vision, particularly relevant to medical imaging where domain shifts are common. We combine causality-inspired theoretical insights on learning domain-invariant representations with recent advancements in diffusion-based augmentation to improve generalization across diverse imaging modalities.
arXiv Detail & Related papers (2024-11-07T22:35:17Z)
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation [40.9095393430871]
We introduce MedViLaM, a unified vision-language model towards a generalist model for medical data. MedViLaM can flexibly encode and interpret various forms of medical data, including clinical language and imaging. We present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.
arXiv Detail & Related papers (2024-09-29T12:23:10Z)
3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection [9.469887408109251]
We introduce a Multimodal and Multi-Teacher Knowledge Distillation model for Mental Health Classification. Unlike conventional approaches that often rely on simple concatenation to integrate diverse features, our model addresses the challenge of appropriately representing inputs of varying natures.
arXiv Detail & Related papers (2024-07-12T06:22:45Z)
Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information. Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z)
Diversified and Personalized Multi-rater Medical Image Segmentation [43.47142636000329]
We propose a two-stage framework named D-Persona (first Diversification and then Personalization). In Stage I, we exploit multiple given annotations to train a Probabilistic U-Net model, with a bound-constrained loss to improve the prediction diversity. In Stage II, we design multiple attention-based projection heads to adaptively query the corresponding expert prompts from the shared latent space, and then perform the personalized medical image segmentation.
arXiv Detail & Related papers (2024-03-20T09:00:19Z)
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP) We develop a generic and personalization generative framework, that can handle a wide range of personalized needs. Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z)
Enhancing Multimodal Unified Representations for Cross Modal Generalization [52.16653133604068]
We propose Training-free Optimization of Codebook (TOC) and Fine and Coarse cross-modal Information Disentangling (FCID)<n>These methods refine the unified discrete representations from pretraining and perform fine- and coarse-grained information disentanglement tailored to the specific characteristics of each modality.
arXiv Detail & Related papers (2024-03-08T09:16:47Z)
Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare [1.7894377200944511]
Stone Needle is a general multimodal large-scale model framework tailored explicitly for healthcare applications. Our architecture can perform multi-modal interaction in multiple rounds of dialogue. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.
arXiv Detail & Related papers (2023-06-28T09:04:56Z)
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types. Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z)
Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion. Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z)
Representational Ethical Model Calibration [0.7078141380481605]
Epistem equity is the comparative fidelity of intelligence in decision-making. No general framework for its quantification, let alone assurance, exists. We introduce a comprehensive framework for Representational Ethical Model.
arXiv Detail & Related papers (2022-07-25T10:33:39Z)
Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models [67.2867506736665]
We propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing.
arXiv Detail & Related papers (2021-12-27T15:28:01Z)
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis [103.69656907534456]
Recent multimodal learning with strong performances on human-centric tasks are often black-box. We propose Multimodal Routing, which adjusts weights between input modalities and output representations differently for each input sample.
arXiv Detail & Related papers (2020-04-29T13:42:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.