WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
- URL: http://arxiv.org/abs/2405.18405v1
- Date: Tue, 28 May 2024 17:46:27 GMT
- Title: WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
- Authors: Jiawei Ma, Yulei Niu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang,
- Abstract summary: We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
- Score: 63.98650220772378
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Language has been useful in extending the vision encoder to data from diverse distributions without empirical discovery in training domains. However, as the image description is mostly at coarse-grained level and ignores visual details, the resulted embeddings are still ineffective in overcoming complexity of domains at inference time. We present a self-supervision framework WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation, by only leveraging data in a single domain and without any test prior. Specifically, for each image, we first estimate the language embedding with fine-grained alignment, which can be consequently used to adaptively identify and then remove domain-specific counterpart from the raw visual embedding. WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT. Experimental studies on three domain generalization datasets demonstrate the effectiveness of our approach.
Related papers
- Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms.
We propose a textbfDomain-Controlled Prompt Learning for the specific domains.
Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - Single-domain Generalization in Medical Image Segmentation via Test-time
Adaptation from Shape Dictionary [64.5632303184502]
Domain generalization typically requires data from multiple source domains for model learning.
This paper studies the important yet challenging single domain generalization problem, in which a model is learned under the worst-case scenario with only one source domain to directly generalize to different unseen target domains.
We present a novel approach to address this problem in medical image segmentation, which extracts and integrates the semantic shape prior information of segmentation that are invariant across domains.
arXiv Detail & Related papers (2022-06-29T08:46:27Z) - Unsupervised Domain Generalization by Learning a Bridge Across Domains [78.855606355957]
Unsupervised Domain Generalization (UDG) setup has no training supervision in neither source nor target domains.
Our approach is based on self-supervised learning of a Bridge Across Domains (BrAD) - an auxiliary bridge domain accompanied by a set of semantics preserving visual (image-to-image) mappings to BrAD from each of the training domains.
We show how using an edge-regularized BrAD our approach achieves significant gains across multiple benchmarks and a range of tasks, including UDG, Few-shot UDA, and unsupervised generalization across multi-domain datasets.
arXiv Detail & Related papers (2021-12-04T10:25:45Z) - WEDGE: Web-Image Assisted Domain Generalization for Semantic
Segmentation [72.88657378658549]
We propose a WEb-image assisted Domain GEneralization scheme, which is the first to exploit the diversity of web-crawled images for generalizable semantic segmentation.
We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.
arXiv Detail & Related papers (2021-09-29T05:19:58Z) - Semi-supervised Meta-learning with Disentanglement for
Domain-generalised Medical Image Segmentation [15.351113774542839]
Generalising models to new data from new centres (termed here domains) remains a challenge.
We propose a novel semi-supervised meta-learning framework with disentanglement.
We show that the proposed method is robust on different segmentation tasks and achieves state-of-the-art generalisation performance on two public benchmarks.
arXiv Detail & Related papers (2021-06-24T19:50:07Z) - Generalizable Model-agnostic Semantic Segmentation via Target-specific
Normalization [24.14272032117714]
We propose a novel domain generalization framework for the generalizable semantic segmentation task.
We exploit the model-agnostic learning to simulate the domain shift problem.
Considering the data-distribution discrepancy between seen source and unseen target domains, we develop the target-specific normalization scheme.
arXiv Detail & Related papers (2020-03-27T09:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.