Domain-robust VQA with diverse datasets and methods but no target labels
- URL: http://arxiv.org/abs/2103.15974v1
- Date: Mon, 29 Mar 2021 22:24:50 GMT
- Title: Domain-robust VQA with diverse datasets and methods but no target labels
- Authors: Mingda Zhang, Tristan Maidment, Ahmad Diab, Adriana Kovashka, Rebecca
Hwa
- Abstract summary: Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity.
To tackle these challenges, we first quantify domain shifts between popular VQA datasets.
We also construct synthetic shifts in the image and question domains separately.
- Score: 34.331228652254566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The observation that computer vision methods overfit to dataset specifics has
inspired diverse attempts to make object recognition models robust to domain
shifts. However, similar work on domain-robust visual question answering
methods is very limited. Domain adaptation for VQA differs from adaptation for
object recognition due to additional complexity: VQA models handle multimodal
inputs, methods contain multiple steps with diverse modules resulting in
complex optimization, and answer spaces in different datasets are vastly
different. To tackle these challenges, we first quantify domain shifts between
popular VQA datasets, in both visual and textual space. To disentangle shifts
between datasets arising from different modalities, we also construct synthetic
shifts in the image and question domains separately. Second, we test the
robustness of different families of VQA methods (classic two-stream,
transformer, and neuro-symbolic methods) to these shifts. Third, we test the
applicability of existing domain adaptation methods and devise a new one to
bridge VQA domain gaps, adjusted to specific VQA models. To emulate the setting
of real-world generalization, we focus on unsupervised domain adaptation and
the open-ended classification task formulation.
Related papers
- Revisiting the Domain Shift and Sample Uncertainty in Multi-source
Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate.
This setting neglects the more practical scenario where training data are collected from multiple sources.
This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z) - VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization [15.554325659263316]
Visual question answering (VQA) models are designed to demonstrate visual-textual reasoning capabilities.
Existing domain generalization datasets for VQA exhibit a unilateral focus on textual shifts.
We propose VQA-GEN, the first ever multi-modal benchmark dataset for distribution shift generated through a shift induced pipeline.
arXiv Detail & Related papers (2023-11-01T19:43:56Z) - Multi-Domain Learning with Modulation Adapters [33.54630534228469]
Multi-domain learning aims to handle related tasks, such as image classification across multiple domains, simultaneously.
Modulation Adapters update the convolutional weights of the model in a multiplicative manner for each task.
Our approach yields excellent results, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-17T14:40:16Z) - Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain.
Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.
We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z) - QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised
Contrastive Adaptation [24.39026345750824]
Question answering (QA) has recently shown impressive results for answering questions from customized domains.
Yet, a common challenge is to adapt QA models to an unseen target domain.
We propose a novel self-supervised framework called QADA for QA domain adaptation.
arXiv Detail & Related papers (2022-10-19T19:52:57Z) - Domain Invariant Masked Autoencoders for Self-supervised Learning from
Multi-domains [73.54897096088149]
We propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains.
The core idea is to augment the input image with style noise from different domains and then reconstruct the image from the embedding of the augmented image.
Experiments on PACS and DomainNet illustrate that DiMAE achieves considerable gains compared with recent state-of-the-art methods.
arXiv Detail & Related papers (2022-05-10T09:49:40Z) - Multi-Granularity Alignment Domain Adaptation for Object Detection [33.32519045960187]
Domain adaptive object detection is challenging due to distinctive data distribution between source domain and target domain.
We propose a unified multi-granularity alignment based object detection framework towards domain-invariant feature learning.
arXiv Detail & Related papers (2022-03-31T09:05:06Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - Improving Transferability of Domain Adaptation Networks Through Domain
Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models.
We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor.
Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z) - Adversarial Dual Distinct Classifiers for Unsupervised Domain Adaptation [67.83872616307008]
Unversarial Domain adaptation (UDA) attempts to recognize the unlabeled target samples by building a learning model from a differently-distributed labeled source domain.
In this paper, we propose a novel Adrial Dual Distincts Network (AD$2$CN) to align the source and target domain data distribution simultaneously with matching task-specific category boundaries.
To be specific, a domain-invariant feature generator is exploited to embed the source and target data into a latent common space with the guidance of discriminative cross-domain alignment.
arXiv Detail & Related papers (2020-08-27T01:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.