Related papers: Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

URL: http://arxiv.org/abs/2507.03304v1
Date: Fri, 04 Jul 2025 05:17:32 GMT
Title: Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Authors: Hai Huang, Yan Xia, Sashuai Zhou, Hanting Wang, Shulei Wang, Zhou Zhao,
Abstract summary: Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains.<n>A key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set.<n>We propose a novel approach that leverages Unified Representations to map different paired modalities together.
Score: 43.07575348801021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Domain Generalization (DG) aims to enhance model robustness in unseen or distributionally shifted target domains through training exclusively on source domains. Although existing DG techniques, such as data manipulation, learning strategies, and representation learning, have shown significant progress, they predominantly address single-modal data. With the emergence of numerous multi-modal datasets and increasing demand for multi-modal tasks, a key challenge in Multi-modal Domain Generalization (MMDG) has emerged: enabling models trained on multi-modal sources to generalize to unseen target distributions within the same modality set. Due to the inherent differences between modalities, directly transferring methods from single-modal DG to MMDG typically yields sub-optimal results. These methods often exhibit randomness during generalization due to the invisibility of target domains and fail to consider inter-modal consistency. Applying these methods independently to each modality in the MMDG setting before combining them can lead to divergent generalization directions across different modalities, resulting in degraded generalization capabilities. To address these challenges, we propose a novel approach that leverages Unified Representations to map different paired modalities together, effectively adapting DG methods to MMDG by enabling synchronized multi-modal improvements within the unified space. Additionally, we introduce a supervised disentanglement framework that separates modal-general and modal-specific information, further enhancing the alignment of unified representations. Extensive experiments on benchmark datasets, including EPIC-Kitchens and Human-Animal-Cartoon, demonstrate the effectiveness and superiority of our method in enhancing multi-modal domain generalization.

Related papers

Generative Classifier for Domain Generalization [84.92088101715116]
Domain generalization aims to the generalizability of computer vision models toward distribution shifts.<n>We propose Generative-driven Domain Generalization (GCDG)<n>GCDG consists of three key modules: Heterogeneity Learning(HLC), Spurious Correlation(SCB), and Diverse Component Balancing(DCB)
arXiv Detail & Related papers (2025-04-03T04:38:33Z)
Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification [57.945437355714155]
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions.<n>Existing approaches focus on single-source domain generalization to unseen target domains.<n>We propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data.
arXiv Detail & Related papers (2024-12-05T06:15:08Z)
Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision [9.03028904066824]
We introduce a novel approach to address Multimodal Open-Set Domain Generalization for the first time, utilizing self-supervision. We propose two innovative multimodal self-supervised pretext tasks: Masked Cross-modal Translation and Multimodal Jigsaw Puzzles. We extend our approach to tackle also the Multimodal Open-Set Domain Adaptation problem, especially in scenarios where unlabeled data from the target domain is available.
arXiv Detail & Related papers (2024-07-01T17:59:09Z)
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization [24.413415998529754]
We propose a new benchmark Hybrid Domain Generalization (HDG) and a novel metric $H2$-CV, which construct various splits to assess the robustness of algorithms. Our method outperforms state-of-the-art algorithms on multiple datasets, especially improving the robustness when confronting data scarcity.
arXiv Detail & Related papers (2024-04-13T13:41:13Z)
SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization [13.456240733175767]
SimMMDG is a framework to overcome the challenges of achieving domain generalization in multi-modal scenarios. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints. Our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon dataset.
arXiv Detail & Related papers (2023-10-30T17:58:09Z)
Compound Domain Generalization via Meta-Knowledge Encoding [55.22920476224671]
We introduce Style-induced Domain-specific Normalization (SDNorm) to re-normalize the multi-modal underlying distributions. We harness the prototype representations, the centroids of classes, to perform relational modeling in the embedding space. Experiments on four standard Domain Generalization benchmarks reveal that COMEN exceeds the state-of-the-art performance without the need of domain supervision.
arXiv Detail & Related papers (2022-03-24T11:54:59Z)
A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario. It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z)
Learning to Diversify for Single Domain Generalization [46.35670520201863]
Domain generalization (DG) aims to generalize a model trained on multiple source (i.e., training) domains to a distributionally different target (i.e., test) domain. This paper considers a more realistic yet challenging scenario, namely Single Domain Generalization (Single-DG), where only one source domain is available for training. In this scenario, the limited diversity may jeopardize the model generalization on unseen target domains. We propose a style-complement module to enhance the generalization power of the model by synthesizing images from diverse distributions that are complementary to the source ones.
arXiv Detail & Related papers (2021-08-26T12:04:32Z)
Dual Distribution Alignment Network for Generalizable Person Re-Identification [174.36157174951603]
Domain generalization (DG) serves as a promising solution to handle person Re-Identification (Re-ID) We present a Dual Distribution Alignment Network (DDAN) which handles this challenge by selectively aligning distributions of multiple source domains. We evaluate our DDAN on a large-scale Domain Generalization Re-ID (DG Re-ID) benchmark.
arXiv Detail & Related papers (2020-07-27T00:08:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.