'Just because you are right, doesn't mean I am wrong': Overcoming a
Bottleneck in the Development and Evaluation of Open-Ended Visual Question
Answering (VQA) Tasks
- URL: http://arxiv.org/abs/2103.15022v1
- Date: Sun, 28 Mar 2021 00:07:08 GMT
- Title: 'Just because you are right, doesn't mean I am wrong': Overcoming a
Bottleneck in the Development and Evaluation of Open-Ended Visual Question
Answering (VQA) Tasks
- Authors: Man Luo, Shailaja Keyur Sampat, Riley Tallman, Yankai Zeng, Manuha
Vancha, Akarshan Sajja, Chitta Baral
- Abstract summary: GQA is a dataset for real-world visual reasoning and compositional question answering.
Many answers predicted by the best vision models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context.
We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools.
- Score: 11.299897008333241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: GQA (Hudson and Manning, 2019) is a dataset for real-world visual reasoning
and compositional question answering. We found that many answers predicted by
the best visionlanguage models on the GQA dataset do not match the ground-truth
answer but still are semantically meaningful and correct in the given context.
In fact, this is the case with most existing visual question answering (VQA)
datasets where they assume only one ground-truth answer for each question. We
propose Alternative Answer Sets (AAS) of ground-truth answers to address this
limitation, which is created automatically using off-the-shelf NLP tools. We
introduce a semantic metric based on AAS and modify top VQA solvers to support
multiple plausible answers for a question. We implement this approach on the
GQA dataset and show the performance improvements.
Related papers
- Fully Authentic Visual Question Answering Dataset from Online Communities [72.0524198499719]
Visual Question Answering (VQA) entails answering questions about images.
We introduce the first VQA dataset in which all contents originate from an authentic use case.
We characterize this dataset and how it relates to eight mainstream VQA datasets.
arXiv Detail & Related papers (2023-11-27T06:19:00Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQA is a crowdsourced dataset composed of about 25K questions.
We demonstrate the potential of this new dataset through a detailed analysis of its contents.
arXiv Detail & Related papers (2022-06-03T17:52:27Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - VANiLLa : Verbalized Answers in Natural Language at Large Scale [2.9098477555578333]
This dataset consists of over 100k simple questions adapted from the CSQA and SimpleQuestionsWikidata datasets.
The answer sentences in this dataset are syntactically and semantically closer to the question than to the triple fact.
arXiv Detail & Related papers (2021-05-24T16:57:54Z) - Unsupervised Evaluation for Question Answering with Transformers [46.16837670041594]
We investigate the hidden representations of questions, answers, and contexts in transformer-based QA architectures.
We observe a consistent pattern in the answer representations, which we show can be used to automatically evaluate whether or not a predicted answer is correct.
We are able to predict whether or not a model's answer is correct with 91.37% accuracy SQuAD, and 80.7% accuracy on SubjQA.
arXiv Detail & Related papers (2020-10-07T07:03:30Z) - IQ-VQA: Intelligent Visual Question Answering [3.09911862091928]
We show that our framework improves consistency of VQA models by 15% on the rule-based dataset.
We also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
arXiv Detail & Related papers (2020-07-08T20:41:52Z) - Fluent Response Generation for Conversational Question Answering [15.826109118064716]
We propose a method for situating responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses.
We use data augmentation to generate training data for an end-to-end system.
arXiv Detail & Related papers (2020-05-21T04:57:01Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.