Split to Merge: Unifying Separated Modalities for Unsupervised Domain
Adaptation
- URL: http://arxiv.org/abs/2403.06946v1
- Date: Mon, 11 Mar 2024 17:33:12 GMT
- Title: Split to Merge: Unifying Separated Modalities for Unsupervised Domain
Adaptation
- Authors: Xinyao Li, Yuke Li, Zhekai Du, Fengling Li, Ke Lu, Jingjing Li
- Abstract summary: We introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation.
We craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components.
Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information.
- Score: 25.499205902426716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large vision-language models (VLMs) like CLIP have demonstrated good
zero-shot learning performance in the unsupervised domain adaptation task. Yet,
most transfer approaches for VLMs focus on either the language or visual
branches, overlooking the nuanced interplay between both modalities. In this
work, we introduce a Unified Modality Separation (UniMoS) framework for
unsupervised domain adaptation. Leveraging insights from modality gap studies,
we craft a nimble modality separation network that distinctly disentangles
CLIP's features into language-associated and vision-associated components. Our
proposed Modality-Ensemble Training (MET) method fosters the exchange of
modality-agnostic information while maintaining modality-specific nuances. We
align features across domains using a modality discriminator. Comprehensive
evaluations on three benchmarks reveal our approach sets a new state-of-the-art
with minimal computational costs. Code: https://github.com/TL-UESTC/UniMoS
Related papers
- Robust Domain Generalization for Multi-modal Object Recognition [14.128747255526012]
In multi-label classification, machine learning encounters the challenge of domain generalization when handling tasks with differing distributions from the training data.
Recent advancements in vision-language pre-training leverage supervision from extensive visual-language pairs, enabling learning across diverse domains.
This paper proposes solutions by inferring the actual loss, broadening evaluations to larger vision-language backbones, and introducing Mixup-CLIPood.
arXiv Detail & Related papers (2024-08-11T17:13:21Z) - Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting.
It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z) - Cross-domain Multi-modal Few-shot Object Detection via Rich Text [21.36633828492347]
Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks.
We study the Cross-Domain few-shot generalization of MM-OD (CDMM-FSOD) and propose a meta-learning based multi-modal few-shot object detection method.
arXiv Detail & Related papers (2024-03-24T15:10:22Z) - APoLLo: Unified Adapter and Prompt Learning for Vision Language Models [58.9772868980283]
We present APoLLo, a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models.
APoLLo achieves a relative gain up to 6.03% over MaPLe (SOTA) on novel classes for 10 diverse image recognition datasets.
arXiv Detail & Related papers (2023-12-04T01:42:09Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Exploiting Domain Transferability for Collaborative Inter-level Domain
Adaptive Object Detection [17.61278045720336]
Domain adaptation for object detection (DAOD) has recently drawn much attention owing to its capability of detecting target objects without any annotations.
Previous works focus on aligning features extracted from partial levels in a two-stage detector via adversarial training.
We introduce a novel framework for ProposalD with three proposed components: Multi-scale-aware Uncertainty Attention (MUA), Transferable Region Network (TRPN), and Dynamic Instance Sampling (DIS)
arXiv Detail & Related papers (2022-07-20T01:50:26Z) - Multi-level Consistency Learning for Semi-supervised Domain Adaptation [85.90600060675632]
Semi-supervised domain adaptation (SSDA) aims to apply knowledge learned from a fully labeled source domain to a scarcely labeled target domain.
We propose a Multi-level Consistency Learning framework for SSDA.
arXiv Detail & Related papers (2022-05-09T06:41:18Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Domain Attention Consistency for Multi-Source Domain Adaptation [100.25573559447551]
Key design is a feature channel attention module, which aims to identify transferable features (attributes)
Experiments on three MSDA benchmarks show that our DAC-Net achieves new state of the art performance on all of them.
arXiv Detail & Related papers (2021-11-06T15:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.