Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
Question Answering
- URL: http://arxiv.org/abs/2312.12723v1
- Date: Wed, 20 Dec 2023 02:35:18 GMT
- Title: Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
Question Answering
- Authors: Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Jian Tang
- Abstract summary: We propose a novel framework that endows the model with capabilities of answering more general questions.
Specifically, a well-defined detector is adopted to predict image-question related relation phrases.
The optimal answer is predicted by choosing the supporting fact with the highest score.
- Score: 32.21000330743921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Question Answering (VQA) has emerged as one of the most challenging
tasks in artificial intelligence due to its multi-modal nature. However, most
existing VQA methods are incapable of handling Knowledge-based Visual Question
Answering (KB-VQA), which requires external knowledge beyond visible contents
to answer questions about a given image. To address this issue, we propose a
novel framework that endows the model with capabilities of answering more
general questions, and achieves a better exploitation of external knowledge
through generating Multiple Clues for Reasoning with Memory Neural Networks
(MCR-MemNN). Specifically, a well-defined detector is adopted to predict
image-question related relation phrases, each of which delivers two
complementary clues to retrieve the supporting facts from external knowledge
base (KB), which are further encoded into a continuous embedding space using a
content-addressable memory. Afterwards, mutual interactions between
visual-semantic representation and the supporting facts stored in memory are
captured to distill the most relevant information in three modalities (i.e.,
image, question, and KB). Finally, the optimal answer is predicted by choosing
the supporting fact with the highest score. We conduct extensive experiments on
two widely-used benchmarks. The experimental results well justify the
effectiveness of MCR-MemNN, as well as its superiority over other KB-VQA
methods.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.