Towards Multimodal Domain Generalization with Few Labels
- URL: http://arxiv.org/abs/2602.22917v1
- Date: Thu, 26 Feb 2026 12:05:56 GMT
- Title: Towards Multimodal Domain Generalization with Few Labels
- Authors: Hongzhao Li, Hao Dong, Hualei Wan, Shupan Li, Mingliang Xu, Muhammad Haris Khan,
- Abstract summary: We introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG)<n>SSMDG aims to learn robust multimodal models from multi-source data with few labeled samples.<n>We propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, Disagreement-Aware Regularization, and Cross-Modal Prototype Alignment.
- Score: 37.21678123296403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal models ideally should generalize to unseen domains while remaining data-efficient to reduce annotation costs. To this end, we introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG), which aims to learn robust multimodal models from multi-source data with few labeled samples. We observe that existing approaches fail to address this setting effectively: multimodal domain generalization methods cannot exploit unlabeled data, semi-supervised multimodal learning methods ignore domain shifts, and semi-supervised domain generalization methods are confined to single-modality inputs. To overcome these limitations, we propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, which obtains reliable pseudo-labels through confident fused-unimodal consensus; Disagreement-Aware Regularization, which effectively utilizes ambiguous non-consensus samples; and Cross-Modal Prototype Alignment, which enforces domain- and modality-invariant representations while promoting robustness under missing modalities via cross-modal translation. We further establish the first SSMDG benchmarks, on which our method consistently outperforms strong baselines in both standard and missing-modality scenarios. Our benchmarks and code are available at https://github.com/lihongzhao99/SSMDG.
Related papers
- Reasoning-Driven Multimodal LLM for Domain Generalization [72.00754603114187]
We study the role of reasoning in domain generalization using DomainBed-Reasoning dataset.<n>We propose RD-MLDG, a framework with two components: MTCT (Multi-Task Cross-Training) and SARR (Self-Aligned Reasoning Regularization)<n>Experiments on standard DomainBed datasets demonstrate that RD-MLDG achieves complementary state-of-the-art performances.
arXiv Detail & Related papers (2026-02-27T08:10:06Z) - Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation [39.02105398462778]
Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data.<n>In multimodal scenarios, varying degrees of distribution shift across different modalities give rise to a complex coupling effect.<n>We propose a novel TTA framework, termed as Bridging Modalities via Progressive Re-alignment (BriMPR)
arXiv Detail & Related papers (2025-11-28T03:33:42Z) - Unsupervised Multi-Source Federated Domain Adaptation under Domain Diversity through Group-Wise Discrepancy Minimization [2.522791298432536]
Unsupervised multi-source domain adaptation (UMDA) aims to learn models that generalize to an unlabeled target domain by leveraging labeled data from multiple, diverse source domains.<n>We propose GALA, a scalable and robust federated UMDA framework that introduces two key components.<n>GALA consistently achieves competitive or state-of-the-art results on standard benchmarks and significantly outperforms prior methods in diverse multi-source settings.
arXiv Detail & Related papers (2025-10-09T12:34:37Z) - LADB: Latent Aligned Diffusion Bridges for Semi-Supervised Domain Translation [54.690154688667086]
Diffusion models excel at generating high-quality outputs but face challenges in data-scarce domains.<n>We propose Latent Aligned Diffusion Bridges (LADB), a semi-supervised framework for sample-to-sample translation.
arXiv Detail & Related papers (2025-09-10T14:23:07Z) - Multi-Prompt Progressive Alignment for Multi-Source Unsupervised Domain Adaptation [73.40696661117408]
We propose a progressive alignment strategy for adapting CLIP to unlabeled downstream task.<n>We name our approach MP2A and test it on three popular UDA benchmarks, namely ImageCLEF, Office-Home, and the most challenging DomainNet.<n> Experiments showcase that MP2A achieves state-of-the-art performance when compared with most recent CLIP-based MS-UDA approaches.
arXiv Detail & Related papers (2025-07-31T09:42:42Z) - Principled Multimodal Representation Learning [99.53621521696051]
Multimodal representation learning seeks to create a unified representation space by integrating diverse data modalities.<n>Recent advances have investigated the simultaneous alignment of multiple modalities, yet several challenges remain.<n>We propose Principled Multimodal Representation Learning (PMRL), a novel framework that achieves simultaneous alignment of multiple modalities.
arXiv Detail & Related papers (2025-07-23T09:12:25Z) - Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations [43.07575348801021]
Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains.<n>A key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set.<n>We propose a novel approach that leverages Unified Representations to map different paired modalities together.
arXiv Detail & Related papers (2025-07-04T05:17:32Z) - Towards Modality Generalization: A Benchmark and Prospective Analysis [68.20973671493203]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities.<n>We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization.<n>Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z) - Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference [20.761803725098005]
Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities.
A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number of inference networks for all possible modality combinations.
We introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework.
arXiv Detail & Related papers (2024-10-15T08:49:38Z) - SimMMDG: A Simple and Effective Framework for Multi-modal Domain
Generalization [13.456240733175767]
SimMMDG is a framework to overcome the challenges of achieving domain generalization in multi-modal scenarios.
We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints.
Our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon dataset.
arXiv Detail & Related papers (2023-10-30T17:58:09Z) - MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation.
To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities.
Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z) - Semi-Supervised Domain Generalization with Stochastic StyleMatch [90.98288822165482]
In real-world applications, we might have only a few labels available from each source domain due to high annotation cost.
In this work, we investigate semi-supervised domain generalization, a more realistic and practical setting.
Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling.
arXiv Detail & Related papers (2021-06-01T16:00:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.