Co-VQA : Answering by Interactive Sub Question Sequence
- URL: http://arxiv.org/abs/2204.00879v1
- Date: Sat, 2 Apr 2022 15:09:16 GMT
- Title: Co-VQA : Answering by Interactive Sub Question Sequence
- Authors: Ruonan Wang, Yuxi Qian, Fangxiang Feng, Xiaojie Wang and Huixing Jiang
- Abstract summary: This paper proposes a conversation-based VQA framework, which consists of three components: Questioner, Oracle, and Answerer.
To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets.
- Score: 18.476819557695087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing approaches to Visual Question Answering (VQA) answer questions
directly, however, people usually decompose a complex question into a sequence
of simple sub questions and finally obtain the answer to the original question
after answering the sub question sequence(SQS). By simulating the process, this
paper proposes a conversation-based VQA (Co-VQA) framework, which consists of
three components: Questioner, Oracle, and Answerer. Questioner raises the sub
questions using an extending HRED model, and Oracle answers them one-by-one. An
Adaptive Chain Visual Reasoning Model (ACVRM) for Answerer is also proposed,
where the question-answer pair is used to update the visual representation
sequentially. To perform supervised learning for each model, we introduce a
well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2
datasets. Experimental results show that our method achieves state-of-the-art
on VQA-CP v2. Further analyses show that SQSs help build direct semantic
connections between questions and images, provide question-adaptive
variable-length reasoning chains, and with explicit interpretability as well as
error traceability.
Related papers
- Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Beyond VQA: Generating Multi-word Answer and Rationale to Visual
Questions [27.807568245576718]
We introduce ViQAR (Visual Question Answering and Reasoning), wherein a model must generate the complete answer and a rationale that seeks to justify the generated answer.
We show that our model generates strong answers and rationales through qualitative and quantitative evaluation, as well as through a human Turing Test.
arXiv Detail & Related papers (2020-10-24T09:44:50Z) - Hierarchical Deep Multi-modal Network for Medical Visual Question
Answering [25.633660028022195]
We propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries.
We integrate the QS model to the hierarchical deep multi-modal neural network to generate proper answers to the queries related to medical images.
arXiv Detail & Related papers (2020-09-27T07:24:41Z) - Multiple interaction learning with question-type prior knowledge for
constraining answer search space in visual question answering [24.395733613284534]
We propose a novel VQA model that utilizes the question-type prior information to improve VQA.
The solid experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that the proposed method yields the best performance with the most competitive approaches.
arXiv Detail & Related papers (2020-09-23T12:54:34Z) - Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions.
We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z) - CQ-VQA: Visual Question Answering on Categorized Questions [3.0013352260516744]
This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA)
The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space.
The second level, referred to as answer predictor (AP), comprises of a set of distinct classifiers corresponding to each question category.
arXiv Detail & Related papers (2020-02-17T06:45:29Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.