Generating Rationales in Visual Question Answering
- URL: http://arxiv.org/abs/2004.02032v1
- Date: Sat, 4 Apr 2020 22:15:35 GMT
- Title: Generating Rationales in Visual Question Answering
- Authors: Hammad A. Ayyubi, Md. Mehrab Tanjim, Julian J. McAuley, and Garrison
W. Cottrell
- Abstract summary: We propose a new task ofrationale generation for Visual QuestionAnswering (VQA)
We use data from the Visual Commonsense Rea-soning (VCR) task, as it contains ground-truths along with visual questions and an-swers.
We train ViLBERT with GPT-2 in an end-to-endfashion with the dual task of predicting the an-swer in VQA and generating rationales.
- Score: 28.45552957339557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent advances in Visual QuestionAnswering (VQA), it remains a
challenge todetermine how much success can be attributedto sound reasoning and
comprehension ability.We seek to investigate this question by propos-ing a new
task ofrationale generation. Es-sentially, we task a VQA model with generat-ing
rationales for the answers it predicts. Weuse data from the Visual Commonsense
Rea-soning (VCR) task, as it contains ground-truthrationales along with visual
questions and an-swers. We first investigate commonsense un-derstanding in one
of the leading VCR mod-els, ViLBERT, by generating rationales frompretrained
weights using a state-of-the-art lan-guage model, GPT-2. Next, we seek to
jointlytrain ViLBERT with GPT-2 in an end-to-endfashion with the dual task of
predicting the an-swer in VQA and generating rationales. Weshow that this kind
of training injects com-monsense understanding in the VQA modelthrough
quantitative and qualitative evaluationmetrics
Related papers
- Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering [2.98667511228225]
ReRe is an encoder-decoder architecture model using a pre-trained clip vision encoder and a pre-trained GPT-2 language model as a decoder.
ReRe outperforms previous methods in VQA accuracy and explanation score and shows improvement in NLE with more persuasive, reliability.
arXiv Detail & Related papers (2024-08-30T04:39:43Z) - Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - Can I Trust Your Answer? Visually Grounded Video Question Answering [88.11169242115416]
We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding.
We construct NExT-GQA -- an extension of NExT-QA with 10.5$K$ temporal grounding labels tied to the original QA pairs.
arXiv Detail & Related papers (2023-09-04T03:06:04Z) - Learning Answer Generation using Supervision from Automatic Question
Answering Evaluators [98.9267570170737]
We propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA)
We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.
arXiv Detail & Related papers (2023-05-24T16:57:04Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - Towards a Unified Model for Generating Answers and Explanations in
Visual Question Answering [11.754328280233628]
We argue that training explanation models independently of the QA model makes the explanations less grounded and limits performance.
We propose a multitask learning approach towards a Unified Model for more grounded and consistent generation of both Answers and Explanations.
arXiv Detail & Related papers (2023-01-25T19:29:19Z) - Knowledge Transfer from Answer Ranking to Answer Generation [97.38378660163414]
We propose to train a GenQA model by transferring knowledge from a trained AS2 model.
We also propose to use the AS2 model prediction scores for loss weighting and score-conditioned input/output shaping.
arXiv Detail & Related papers (2022-10-23T21:51:27Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.