LaKo: Knowledge-driven Visual Question Answering via Late
Knowledge-to-Text Injection
- URL: http://arxiv.org/abs/2207.12888v1
- Date: Tue, 26 Jul 2022 13:29:51 GMT
- Title: LaKo: Knowledge-driven Visual Question Answering via Late
Knowledge-to-Text Injection
- Authors: Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Yin Fang, Jeff Pan,
Ningyu Zhang, Wen Zhang
- Abstract summary: We propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection.
To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism.
In the evaluation with OKVQA datasets, our method achieves state-of-the-art results.
- Score: 30.65373229617201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual question answering (VQA) often requires an understanding of visual
concepts and language semantics, which relies on external knowledge. Most
existing methods exploit pre-trained language models or/and unstructured text,
but the knowledge in these resources are often incomplete and noisy. Some
methods prefer to use knowledge graphs (KGs) which often have intensive
structured knowledge, but the research is still quite preliminary. In this
paper, we propose LaKo, a knowledge-driven VQA method via Late
Knowledge-to-text Injection. To effectively incorporate an external KG, we
transfer triples into text and propose a late injection mechanism. Finally we
address VQA as a text generation task with an effective encoder-decoder
paradigm. In the evaluation with OKVQA datasets, our method achieves
state-of-the-art results.
Related papers
- A Simple Baseline for Knowledge-Based Visual Question Answering [78.00758742784532]
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA)
Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline.
Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets.
arXiv Detail & Related papers (2023-10-20T15:08:17Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Asking for Knowledge: Training RL Agents to Query External Knowledge
Using Language [121.56329458876655]
We introduce two new environments: the grid-world-based Q-BabyAI and the text-based Q-TextWorld.
We propose the "Asking for Knowledge" (AFK) agent, which learns to generate language commands to query for meaningful knowledge.
arXiv Detail & Related papers (2022-05-12T14:20:31Z) - Incorporating Explicit Knowledge in Pre-trained Language Models for
Passage Re-ranking [32.22697200984185]
We propose a novel knowledge graph distillation method and obtain a knowledge meta graph as the bridge between query and passage.
To align both kinds of embedding in the latent space, we employ PLM as text encoder and graph neural network over knowledge meta graph as knowledge encoder.
arXiv Detail & Related papers (2022-04-25T14:07:28Z) - TegTok: Augmenting Text Generation via Task-specific and Open-world
Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question
Answering [16.96751206502189]
Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images.
One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval.
We propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA.
arXiv Detail & Related papers (2021-09-09T03:21:32Z) - External Knowledge Augmented Text Visual Question Answering [0.6445605125467573]
We propose a framework to extract, filter, and encode knowledge atop a standard multimodal transformer for vision language understanding tasks.
We generate results comparable to the state-of-the-art on two publicly available datasets.
arXiv Detail & Related papers (2021-08-22T13:21:58Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Improving Commonsense Question Answering by Graph-based Iterative
Retrieval over Multiple Knowledge Sources [26.256653692882715]
How to engage commonsense effectively in question answering systems is still under exploration.
We propose a novel question-answering method by integrating ConceptNet, Wikipedia, and the Cambridge Dictionary.
We use a pre-trained language model to encode the question, retrieved knowledge and choices, and propose an answer choice-aware attention mechanism.
arXiv Detail & Related papers (2020-11-05T08:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.