Multiple interaction learning with question-type prior knowledge for
constraining answer search space in visual question answering
- URL: http://arxiv.org/abs/2009.11118v1
- Date: Wed, 23 Sep 2020 12:54:34 GMT
- Title: Multiple interaction learning with question-type prior knowledge for
constraining answer search space in visual question answering
- Authors: Tuong Do, Binh X. Nguyen, Huy Tran, Erman Tjiputra, Quang D. Tran,
Thanh-Toan Do
- Abstract summary: We propose a novel VQA model that utilizes the question-type prior information to improve VQA.
The solid experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that the proposed method yields the best performance with the most competitive approaches.
- Score: 24.395733613284534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different approaches have been proposed to Visual Question Answering (VQA).
However, few works are aware of the behaviors of varying joint modality methods
over question type prior knowledge extracted from data in constraining answer
search space, of which information gives a reliable cue to reason about answers
for questions asked in input images. In this paper, we propose a novel VQA
model that utilizes the question-type prior information to improve VQA by
leveraging the multiple interactions between different joint modality methods
based on their behaviors in answering questions from different types. The solid
experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that
the proposed method yields the best performance with the most competitive
approaches.
Related papers
- Multimodal Reranking for Knowledge-Intensive Visual Question Answering [77.24401833951096]
We introduce a multi-modal reranker to improve the ranking quality of knowledge candidates for answer generation.
Experiments on OK-VQA and A-OKVQA show that multi-modal reranker from distant supervision provides consistent improvements.
arXiv Detail & Related papers (2024-07-17T02:58:52Z) - Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA [19.6585442152102]
We study the Knowledge-Based visual question-answering problem, for which given a question, the models need to ground it into the visual modality to find the answer.
Our study shows that replacing a complex question with several simpler questions helps to extract more relevant information from the image.
arXiv Detail & Related papers (2024-06-27T02:19:38Z) - Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
Question Answering [32.21000330743921]
We propose a novel framework that endows the model with capabilities of answering more general questions.
Specifically, a well-defined detector is adopted to predict image-question related relation phrases.
The optimal answer is predicted by choosing the supporting fact with the highest score.
arXiv Detail & Related papers (2023-12-20T02:35:18Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Co-VQA : Answering by Interactive Sub Question Sequence [18.476819557695087]
This paper proposes a conversation-based VQA framework, which consists of three components: Questioner, Oracle, and Answerer.
To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets.
arXiv Detail & Related papers (2022-04-02T15:09:16Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - Effective FAQ Retrieval and Question Matching With Unsupervised
Knowledge Injection [10.82418428209551]
We propose a contextual language model for retrieving appropriate answers to frequently asked questions.
We also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner.
We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task.
arXiv Detail & Related papers (2020-10-27T05:03:34Z) - Match$^2$: A Matching over Matching Model for Similar Question
Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers.
Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked.
It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions.
Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.