Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
- URL: http://arxiv.org/abs/2402.04416v2
- Date: Wed, 29 May 2024 13:56:14 GMT
- Title: Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
- Authors: Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis,
- Abstract summary: We tackle the multimodal version of the unsupervised domain generalization problem.
Our framework relies on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space.
We show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization.
- Score: 11.96884248631201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg
Related papers
- Robust Target Training for Multi-Source Domain Adaptation [110.77704026569499]
We propose a novel Bi-level Optimization based Robust Target Training (BORT$2$) method for MSDA.
Our proposed method achieves the state of the art performance on three MSDA benchmarks, including the large-scale DomainNet dataset.
arXiv Detail & Related papers (2022-10-04T15:20:01Z) - Domain Adaptive Person Search [20.442648584402917]
We present Domain Adaptive Person Search (DAPS), which aims to generalize the model from a labeled source domain to the unlabeled target domain.
We show that our framework achieves 34.7% in mAP and 80.6% in top-1 on PRW dataset.
arXiv Detail & Related papers (2022-07-25T04:02:39Z) - Low-confidence Samples Matter for Domain Adaptation [47.552605279925736]
Domain adaptation (DA) aims to transfer knowledge from a label-rich source domain to a related but label-scarce target domain.
We propose a novel contrastive learning method by processing low-confidence samples.
We evaluate the proposed method in both unsupervised and semi-supervised DA settings.
arXiv Detail & Related papers (2022-02-06T15:45:45Z) - Improving Multi-Domain Generalization through Domain Re-labeling [31.636953426159224]
We study the important link between pre-specified domain labels and the generalization performance.
We introduce a general approach for multi-domain generalization, MulDEns, that uses an ERM-based deep ensembling backbone.
We show that MulDEns does not require tailoring the augmentation strategy or the training process specific to a dataset.
arXiv Detail & Related papers (2021-12-17T23:21:50Z) - Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for
Semantic Segmentation [91.30558794056056]
Unsupervised domain adaptation (UDA) for semantic segmentation has been attracting attention recently.
We present a novel framework based on three main design principles: discover, hallucinate, and adapt.
We evaluate our solution on standard benchmark GTA to C-driving, and achieved new state-of-the-art results.
arXiv Detail & Related papers (2021-10-08T13:20:09Z) - Seeking Similarities over Differences: Similarity-based Domain Alignment
for Adaptive Object Detection [86.98573522894961]
We propose a framework that generalizes the components commonly used by Unsupervised Domain Adaptation (UDA) algorithms for detection.
Specifically, we propose a novel UDA algorithm, ViSGA, that leverages the best design choices and introduces a simple but effective method to aggregate features at instance-level.
We show that both similarity-based grouping and adversarial training allows our model to focus on coarsely aligning feature groups, without being forced to match all instances across loosely aligned domains.
arXiv Detail & Related papers (2021-10-04T13:09:56Z) - Instance Level Affinity-Based Transfer for Unsupervised Domain
Adaptation [74.71931918541748]
We propose an instance affinity based criterion for source to target transfer during adaptation, called ILA-DA.
We first propose a reliable and efficient method to extract similar and dissimilar samples across source and target, and utilize a multi-sample contrastive loss to drive the domain alignment process.
We verify the effectiveness of ILA-DA by observing consistent improvements in accuracy over popular domain adaptation approaches on a variety of benchmark datasets.
arXiv Detail & Related papers (2021-04-03T01:33:14Z) - Divergence Optimization for Noisy Universal Domain Adaptation [32.05829135903389]
Universal domain adaptation (UniDA) has been proposed to transfer knowledge learned from a label-rich source domain to a label-scarce target domain.
This paper introduces a two-head convolutional neural network framework to solve all problems simultaneously.
arXiv Detail & Related papers (2021-04-01T04:16:04Z) - Learning Target Domain Specific Classifier for Partial Domain Adaptation [85.71584004185031]
Unsupervised domain adaptation (UDA) aims at reducing the distribution discrepancy when transferring knowledge from a labeled source domain to an unlabeled target domain.
This paper focuses on a more realistic UDA scenario, where the target label space is subsumed to the source label space.
arXiv Detail & Related papers (2020-08-25T02:28:24Z) - Discrepancy Minimization in Domain Generalization with Generative
Nearest Neighbors [13.047289562445242]
Domain generalization (DG) deals with the problem of domain shift where a machine learning model trained on multiple-source domains fail to generalize well on a target domain with different statistics.
Multiple approaches have been proposed to solve the problem of domain generalization by learning domain invariant representations across the source domains that fail to guarantee generalization on the shifted target domain.
We propose a Generative Nearest Neighbor based Discrepancy Minimization (GNNDM) method which provides a theoretical guarantee that is upper bounded by the error in the labeling process of the target.
arXiv Detail & Related papers (2020-07-28T14:54:25Z) - Sparsely-Labeled Source Assisted Domain Adaptation [64.75698236688729]
This paper proposes a novel Sparsely-Labeled Source Assisted Domain Adaptation (SLSA-DA) algorithm.
Due to the label scarcity problem, the projected clustering is conducted on both the source and target domains.
arXiv Detail & Related papers (2020-05-08T15:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.