Related papers: Domain-robust VQA with diverse datasets and methods but no target labels

Domain-robust VQA with diverse datasets and methods but no target labels

URL: http://arxiv.org/abs/2103.15974v1
Date: Mon, 29 Mar 2021 22:24:50 GMT
Title: Domain-robust VQA with diverse datasets and methods but no target labels
Authors: Mingda Zhang, Tristan Maidment, Ahmad Diab, Adriana Kovashka, Rebecca Hwa
Abstract summary: Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity. To tackle these challenges, we first quantify domain shifts between popular VQA datasets. We also construct synthetic shifts in the image and question domains separately.
Score: 34.331228652254566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The observation that computer vision methods overfit to dataset specifics has inspired diverse attempts to make object recognition models robust to domain shifts. However, similar work on domain-robust visual question answering methods is very limited. Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity: VQA models handle multimodal inputs, methods contain multiple steps with diverse modules resulting in complex optimization, and answer spaces in different datasets are vastly different. To tackle these challenges, we first quantify domain shifts between popular VQA datasets, in both visual and textual space. To disentangle shifts between datasets arising from different modalities, we also construct synthetic shifts in the image and question domains separately. Second, we test the robustness of different families of VQA methods (classic two-stream, transformer, and neuro-symbolic methods) to these shifts. Third, we test the applicability of existing domain adaptation methods and devise a new one to bridge VQA domain gaps, adjusted to specific VQA models. To emulate the setting of real-world generalization, we focus on unsupervised domain adaptation and the open-ended classification task formulation.

Related papers

Advancing Cross-Organ Domain Generalization with Test-Time Style Transfer and Diversity Enhancement [15.154556569127116]
We propose a Test-time style transfer (T3s) that uses a bidirectional mapping mechanism to project the features of the source and target domains into a unified feature space. To further increase the style expression space, we introduce a Cross-domain style diversification module. Our method has demonstrated effectiveness on three unseen datasets.
arXiv Detail & Related papers (2025-03-24T11:22:27Z)
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources. This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z)
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization [15.554325659263316]
Visual question answering (VQA) models are designed to demonstrate visual-textual reasoning capabilities. Existing domain generalization datasets for VQA exhibit a unilateral focus on textual shifts. We propose VQA-GEN, the first ever multi-modal benchmark dataset for distribution shift generated through a shift induced pipeline.
arXiv Detail & Related papers (2023-11-01T19:43:56Z)
Multi-Domain Learning with Modulation Adapters [33.54630534228469]
Multi-domain learning aims to handle related tasks, such as image classification across multiple domains, simultaneously. Modulation Adapters update the convolutional weights of the model in a multiplicative manner for each task. Our approach yields excellent results, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-17T14:40:16Z)
Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain. Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z)
QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation [24.39026345750824]
Question answering (QA) has recently shown impressive results for answering questions from customized domains. Yet, a common challenge is to adapt QA models to an unseen target domain. We propose a novel self-supervised framework called QADA for QA domain adaptation.
arXiv Detail & Related papers (2022-10-19T19:52:57Z)
Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains [73.54897096088149]
We propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains. The core idea is to augment the input image with style noise from different domains and then reconstruct the image from the embedding of the augmented image. Experiments on PACS and DomainNet illustrate that DiMAE achieves considerable gains compared with recent state-of-the-art methods.
arXiv Detail & Related papers (2022-05-10T09:49:40Z)
Multi-Granularity Alignment Domain Adaptation for Object Detection [33.32519045960187]
Domain adaptive object detection is challenging due to distinctive data distribution between source domain and target domain. We propose a unified multi-granularity alignment based object detection framework towards domain-invariant feature learning.
arXiv Detail & Related papers (2022-03-31T09:05:06Z)
MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces. We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA) Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z)
Improving Transferability of Domain Adaptation Networks Through Domain Alignment Layers [1.3766148734487902]
Multi-source unsupervised domain adaptation (MSDA) aims at learning a predictor for an unlabeled domain by assigning weak knowledge from a bag of source models. We propose to embed Multi-Source version of DomaIn Alignment Layers (MS-DIAL) at different levels of the predictor. Our approach can improve state-of-the-art MSDA methods, yielding relative gains of up to +30.64% on their classification accuracies.
arXiv Detail & Related papers (2021-09-06T18:41:19Z)
A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks. In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data. To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z)
Adversarial Dual Distinct Classifiers for Unsupervised Domain Adaptation [67.83872616307008]
Unversarial Domain adaptation (UDA) attempts to recognize the unlabeled target samples by building a learning model from a differently-distributed labeled source domain. In this paper, we propose a novel Adrial Dual Distincts Network (AD$2$CN) to align the source and target domain data distribution simultaneously with matching task-specific category boundaries. To be specific, a domain-invariant feature generator is exploited to embed the source and target data into a latent common space with the guidance of discriminative cross-domain alignment.
arXiv Detail & Related papers (2020-08-27T01:29:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.