Knowledge Generation for Zero-shot Knowledge-based VQA
- URL: http://arxiv.org/abs/2402.02541v1
- Date: Sun, 4 Feb 2024 15:41:35 GMT
- Title: Knowledge Generation for Zero-shot Knowledge-based VQA
- Authors: Rui Cao and Jing Jiang
- Abstract summary: Previous solutions to knowledge-based visual question answering(K-VQA) retrieve knowledge from external knowledge bases and use supervised learning to train the K-VQA model.
We propose and test a similar knowledge-generation-based K-VQA method, which first generates knowledge from an LLM and then incorporates the generated knowledge for K-VQA in a zero-shot manner.
- Score: 20.674979268279728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous solutions to knowledge-based visual question answering~(K-VQA)
retrieve knowledge from external knowledge bases and use supervised learning to
train the K-VQA model. Recently pre-trained LLMs have been used as both a
knowledge source and a zero-shot QA model for K-VQA and demonstrated promising
results. However, these recent methods do not explicitly show the knowledge
needed to answer the questions and thus lack interpretability. Inspired by
recent work on knowledge generation from LLMs for text-based QA, in this work
we propose and test a similar knowledge-generation-based K-VQA method, which
first generates knowledge from an LLM and then incorporates the generated
knowledge for K-VQA in a zero-shot manner. We evaluate our method on two K-VQA
benchmarks and found that our method performs better than previous zero-shot
K-VQA methods and our generated knowledge is generally relevant and helpful.
Related papers
- CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering [33.89497991289916]
We propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner.
We conduct experiments using various Large Language Models (LLMs) across several Knowledge Graph Question Answering (KGQA) benchmarks.
arXiv Detail & Related papers (2024-09-29T16:08:45Z) - Knowledge Condensation and Reasoning for Knowledge-based VQA [20.808840633377343]
Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions.
We propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model.
Our method achieves state-of-the-art performance on knowledge-based VQA datasets.
arXiv Detail & Related papers (2024-03-15T06:06:06Z) - Distinguish Before Answer: Generating Contrastive Explanation as
Knowledge for Commonsense Question Answering [61.53454387743701]
We propose CPACE, a concept-centric Prompt-bAsed Contrastive Explanation Generation model.
CPACE converts obtained symbolic knowledge into a contrastive explanation for better distinguishing the differences among given candidates.
We conduct a series of experiments on three widely-used question-answering datasets: CSQA, QASC, and OBQA.
arXiv Detail & Related papers (2023-05-14T12:12:24Z) - Prophet: Prompting Large Language Models with Complementary Answer
Heuristics for Knowledge-based Visual Question Answering [30.858737348472626]
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question.
Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering.
We present a conceptually simple, flexible, and general framework designed to prompt LLM with answers for knowledge-based VQA.
arXiv Detail & Related papers (2023-03-03T13:05:15Z) - VLC-BERT: Visual Question Answering with Contextualized Commonsense
Knowledge [48.457788853408616]
We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues.
We show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases.
arXiv Detail & Related papers (2022-10-24T22:01:17Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG.
This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge.
We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z) - Incremental Knowledge Based Question Answering [52.041815783025186]
We propose a new incremental KBQA learning framework that can progressively expand learning capacity as humans do.
Specifically, it comprises a margin-distilled loss and a collaborative selection method, to overcome the catastrophic forgetting problem.
The comprehensive experiments demonstrate its effectiveness and efficiency when working with the evolving knowledge base.
arXiv Detail & Related papers (2021-01-18T09:03:38Z) - Benchmarking Knowledge-Enhanced Commonsense Question Answering via
Knowledge-to-Text Transformation [30.38055266965927]
We investigate how far can we get by exploiting external knowledge for Commonsense Question Answering.
We benchmark knowledge-enhanced CQA using a simple and effective knowledge-to-text transformation framework.
Experiments show that our knowledge-to-text framework is effective and state-of-the-art performance on CommonsenseQA dataset.
arXiv Detail & Related papers (2021-01-04T04:29:03Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.