Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
Grounding
- URL: http://arxiv.org/abs/2201.09639v1
- Date: Mon, 24 Jan 2022 12:42:30 GMT
- Title: Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
Grounding
- Authors: Arjun R. Akula
- Abstract summary: Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image.
We are working on a VQG module that facilitate in automatically generating OOD shifts that aid in systematically evaluating cross-dataset adaptation capabilities of VQA models.
- Score: 7.995360025953931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual question answering (VQA) is the multi-modal task of answering natural
language questions about an input image. Through cross-dataset adaptation
methods, it is possible to transfer knowledge from a source dataset with larger
train samples to a target dataset where training set is limited. Suppose a VQA
model trained on one dataset train set fails in adapting to another, it is hard
to identify the underlying cause of domain mismatch as there could exists a
multitude of reasons such as image distribution mismatch and question
distribution mismatch. At UCLA, we are working on a VQG module that facilitate
in automatically generating OOD shifts that aid in systematically evaluating
cross-dataset adaptation capabilities of VQA models.
Related papers
- VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization [15.554325659263316]
Visual question answering (VQA) models are designed to demonstrate visual-textual reasoning capabilities.
Existing domain generalization datasets for VQA exhibit a unilateral focus on textual shifts.
We propose VQA-GEN, the first ever multi-modal benchmark dataset for distribution shift generated through a shift induced pipeline.
arXiv Detail & Related papers (2023-11-01T19:43:56Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - From Easy to Hard: Learning Language-guided Curriculum for Visual
Question Answering on Remote Sensing Data [27.160303686163164]
Visual question answering (VQA) for remote sensing scene has great potential in intelligent human-computer interaction system.
No object annotations are available in RSVQA datasets, which makes it difficult for models to exploit informative region representation.
There are questions with clearly different difficulty levels for each image in the RSVQA task.
A multi-level visual feature learning method is proposed to jointly extract language-guided holistic and regional image features.
arXiv Detail & Related papers (2022-05-06T11:37:00Z) - Domain-robust VQA with diverse datasets and methods but no target labels [34.331228652254566]
Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity.
To tackle these challenges, we first quantify domain shifts between popular VQA datasets.
We also construct synthetic shifts in the image and question domains separately.
arXiv Detail & Related papers (2021-03-29T22:24:50Z) - MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
Visual Question Answering [58.30291671877342]
We present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input.
MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57%$ improvement.
arXiv Detail & Related papers (2020-09-18T00:22:54Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.