Related papers: Knowledge Generation for Zero-shot Knowledge-based VQA

Knowledge Generation for Zero-shot Knowledge-based VQA

URL: http://arxiv.org/abs/2402.02541v1
Date: Sun, 4 Feb 2024 15:41:35 GMT
Title: Knowledge Generation for Zero-shot Knowledge-based VQA
Authors: Rui Cao and Jing Jiang
Abstract summary: Previous solutions to knowledge-based visual question answering(K-VQA) retrieve knowledge from external knowledge bases and use supervised learning to train the K-VQA model. We propose and test a similar knowledge-generation-based K-VQA method, which first generates knowledge from an LLM and then incorporates the generated knowledge for K-VQA in a zero-shot manner.
Score: 20.674979268279728
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Previous solutions to knowledge-based visual question answering~(K-VQA) retrieve knowledge from external knowledge bases and use supervised learning to train the K-VQA model. Recently pre-trained LLMs have been used as both a knowledge source and a zero-shot QA model for K-VQA and demonstrated promising results. However, these recent methods do not explicitly show the knowledge needed to answer the questions and thus lack interpretability. Inspired by recent work on knowledge generation from LLMs for text-based QA, in this work we propose and test a similar knowledge-generation-based K-VQA method, which first generates knowledge from an LLM and then incorporates the generated knowledge for K-VQA in a zero-shot manner. We evaluate our method on two K-VQA benchmarks and found that our method performs better than previous zero-shot K-VQA methods and our generated knowledge is generally relevant and helpful.

Related papers

CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering [33.89497991289916]
We propose a novel rewriting method CoTKR, Chain-of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner. We conduct experiments using various Large Language Models (LLMs) across several Knowledge Graph Question Answering (KGQA) benchmarks.
arXiv Detail & Related papers (2024-09-29T16:08:45Z)
Knowledge Condensation and Reasoning for Knowledge-based VQA [20.808840633377343]
Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions. We propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model. Our method achieves state-of-the-art performance on knowledge-based VQA datasets.
arXiv Detail & Related papers (2024-03-15T06:06:06Z)
Distinguish Before Answer: Generating Contrastive Explanation as Knowledge for Commonsense Question Answering [61.53454387743701]
We propose CPACE, a concept-centric Prompt-bAsed Contrastive Explanation Generation model. CPACE converts obtained symbolic knowledge into a contrastive explanation for better distinguishing the differences among given candidates. We conduct a series of experiments on three widely-used question-answering datasets: CSQA, QASC, and OBQA.
arXiv Detail & Related papers (2023-05-14T12:12:24Z)
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering [30.858737348472626]
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering. We present a conceptually simple, flexible, and general framework designed to prompt LLM with answers for knowledge-based VQA.
arXiv Detail & Related papers (2023-03-03T13:05:15Z)
VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge [48.457788853408616]
We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues. We show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases.
arXiv Detail & Related papers (2022-10-24T22:01:17Z)
Structured Knowledge Grounding for Question Answering [0.23068481501673416]
We propose to leverage the language and knowledge for knowledge based question-answering with flexibility, breadth of coverage and structured reasoning. Specifically, we devise a knowledge construction method that retrieves the relevant context with a dynamic hop. And we devise a deep fusion mechanism to further bridge the information exchanging bottleneck between the language and the knowledge.
arXiv Detail & Related papers (2022-09-17T08:48:50Z)
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA. We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning. Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z)
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z)
Incremental Knowledge Based Question Answering [52.041815783025186]
We propose a new incremental KBQA learning framework that can progressively expand learning capacity as humans do. Specifically, it comprises a margin-distilled loss and a collaborative selection method, to overcome the catastrophic forgetting problem. The comprehensive experiments demonstrate its effectiveness and efficiency when working with the evolving knowledge base.
arXiv Detail & Related papers (2021-01-18T09:03:38Z)
Benchmarking Knowledge-Enhanced Commonsense Question Answering via Knowledge-to-Text Transformation [30.38055266965927]
We investigate how far can we get by exploiting external knowledge for Commonsense Question Answering. We benchmark knowledge-enhanced CQA using a simple and effective knowledge-to-text transformation framework. Experiments show that our knowledge-to-text framework is effective and state-of-the-art performance on CommonsenseQA dataset.
arXiv Detail & Related papers (2021-01-04T04:29:03Z)
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image. In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time. We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.