INDIGO: Intrinsic Multimodality for Domain Generalization
- URL: http://arxiv.org/abs/2206.05912v1
- Date: Mon, 13 Jun 2022 05:41:09 GMT
- Title: INDIGO: Intrinsic Multimodality for Domain Generalization
- Authors: Puneet Mangla and Shivam Chandhok and Milan Aggarwal and Vineeth N
Balasubramanian and Balaji Krishnamurthy
- Abstract summary: We study how multimodal information can be leveraged in an "intrinsic" way to make systems generalize under unseen domains.
We propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO)
- Score: 26.344372409315177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For models to generalize under unseen domains (a.k.a domain generalization),
it is crucial to learn feature representations that are domain-agnostic and
capture the underlying semantics that makes up an object category. Recent
advances towards weakly supervised vision-language models that learn holistic
representations from cheap weakly supervised noisy text annotations have shown
their ability on semantic understanding by capturing object characteristics
that generalize under different domains. However, when multiple source domains
are involved, the cost of curating textual annotations for every image in the
dataset can blow up several times, depending on their number. This makes the
process tedious and infeasible, hindering us from directly using these
supervised vision-language approaches to achieve the best generalization on an
unseen domain. Motivated from this, we study how multimodal information from
existing pre-trained multimodal networks can be leveraged in an "intrinsic" way
to make systems generalize under unseen domains. To this end, we propose
IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO), a simple and
elegant way of leveraging the intrinsic modality present in these pre-trained
multimodal networks along with the visual modality to enhance generalization to
unseen domains at test-time. We experiment on several Domain Generalization
settings (ClosedDG, OpenDG, and Limited sources) and show state-of-the-art
generalization performance on unseen domains. Further, we provide a thorough
analysis to develop a holistic understanding of INDIGO.
Related papers
- Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Beyond Finite Data: Towards Data-free Out-of-distribution Generalization
via Extrapolation [19.944946262284123]
Humans can easily extrapolate novel domains, thus, an intriguing question arises: How can neural networks extrapolate like humans and achieve OOD generalization?
We introduce a novel approach to domain extrapolation that leverages reasoning ability and the extensive knowledge encapsulated within large language models (LLMs) to synthesize entirely new domains.
Our methods exhibit commendable performance in this setting, even surpassing the supervised setting by approximately 1-2% on datasets such as VLCS.
arXiv Detail & Related papers (2024-03-08T18:44:23Z) - Compound Domain Generalization via Meta-Knowledge Encoding [55.22920476224671]
We introduce Style-induced Domain-specific Normalization (SDNorm) to re-normalize the multi-modal underlying distributions.
We harness the prototype representations, the centroids of classes, to perform relational modeling in the embedding space.
Experiments on four standard Domain Generalization benchmarks reveal that COMEN exceeds the state-of-the-art performance without the need of domain supervision.
arXiv Detail & Related papers (2022-03-24T11:54:59Z) - Unsupervised Domain Generalization by Learning a Bridge Across Domains [78.855606355957]
Unsupervised Domain Generalization (UDG) setup has no training supervision in neither source nor target domains.
Our approach is based on self-supervised learning of a Bridge Across Domains (BrAD) - an auxiliary bridge domain accompanied by a set of semantics preserving visual (image-to-image) mappings to BrAD from each of the training domains.
We show how using an edge-regularized BrAD our approach achieves significant gains across multiple benchmarks and a range of tasks, including UDG, Few-shot UDA, and unsupervised generalization across multi-domain datasets.
arXiv Detail & Related papers (2021-12-04T10:25:45Z) - Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen
Domains [48.17225008334873]
We propose a feature generative framework integrated with a COntext COnditional Adaptive (COCOA) Batch-Normalization.
The generated visual features better capture the underlying data distribution enabling us to generalize to unseen classes and domains at test-time.
We thoroughly evaluate and analyse our approach on established large-scale benchmark - DomainNet.
arXiv Detail & Related papers (2021-07-15T17:51:16Z) - Structured Latent Embeddings for Recognizing Unseen Classes in Unseen
Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains.
Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z) - Learning to Balance Specificity and Invariance for In and Out of Domain
Generalization [27.338573739304604]
We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance.
For domain generalization, the goal is to learn from a set of source domains to produce a single model that will best generalize to an unseen target domain.
We demonstrate competitive performance compared to naive baselines and state-of-the-art methods on both PACS and DomainNet.
arXiv Detail & Related papers (2020-08-28T20:39:51Z) - Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN [117.80737222754306]
We present a novel universal object detector called Universal-RCNN.
We first generate a global semantic pool by integrating all high-level semantic representation of all the categories.
An Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN.
arXiv Detail & Related papers (2020-02-18T07:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.