Toward Unsupervised Realistic Visual Question Answering
- URL: http://arxiv.org/abs/2303.05068v1
- Date: Thu, 9 Mar 2023 06:58:29 GMT
- Title: Toward Unsupervised Realistic Visual Question Answering
- Authors: Yuwei Zhang, Chih-Hui Ho, Nuno Vasconcelos
- Abstract summary: We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
- Score: 70.67698100148414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of realistic VQA (RVQA), where a model has to reject unanswerable
questions (UQs) and answer answerable ones (AQs), is studied. We first point
out 2 drawbacks in current RVQA research, where (1) datasets contain too many
unchallenging UQs and (2) a large number of annotated UQs are required for
training. To resolve the first drawback, we propose a new testing dataset,
RGQA, which combines AQs from an existing VQA dataset with around 29K
human-annotated UQs. These UQs consist of both fine-grained and coarse-grained
image-question pairs generated with 2 approaches: CLIP-based and
Perturbation-based. To address the second drawback, we introduce an
unsupervised training approach. This combines pseudo UQs obtained by randomly
pairing images and questions, with an RoI Mixup procedure to generate more
fine-grained pseudo UQs, and model ensembling to regularize model confidence.
Experiments show that using pseudo UQs significantly outperforms RVQA
baselines. RoI Mixup and model ensembling further increase the gain. Finally,
human evaluation reveals a performance gap between humans and models, showing
that more RVQA research is needed.
Related papers
- Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question
Answering [87.18962441714976]
We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA)
We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, and find that RoMQA is challenging.
Our results show that RoMQA is a challenging benchmark for large language models, and provides a quantifiable test to build more robust QA methods.
arXiv Detail & Related papers (2022-10-25T21:39:36Z) - Knowledge Transfer from Answer Ranking to Answer Generation [97.38378660163414]
We propose to train a GenQA model by transferring knowledge from a trained AS2 model.
We also propose to use the AS2 model prediction scores for loss weighting and score-conditioned input/output shaping.
arXiv Detail & Related papers (2022-10-23T21:51:27Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - IQ-VQA: Intelligent Visual Question Answering [3.09911862091928]
We show that our framework improves consistency of VQA models by 15% on the rule-based dataset.
We also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
arXiv Detail & Related papers (2020-07-08T20:41:52Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.