Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA
- URL: http://arxiv.org/abs/2406.12746v3
- Date: Sun, 23 Jun 2024 03:06:42 GMT
- Title: Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA
- Authors: Miaoyu Li, Haoxin Li, Zilin Du, Boyang Li,
- Abstract summary: Knowledge-based Visual Qustion-answering (K-VQA) requires the use of background knowledge beyond what is depicted in the image.
Current zero-shot K-VQA methods usually translate an image to a single type of textual decision context and use a text-based model to answer the question based on it.
We propose Rationale-based Ensemble of Answer Context Tactics (REACT) to achieve a dynamic ensemble of multiple question-answering tactics.
- Score: 8.498145119681437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-based Visual Qustion-answering (K-VQA) necessitates the use of background knowledge beyond what is depicted in the image. Current zero-shot K-VQA methods usually translate an image to a single type of textual decision context and use a text-based model to answer the question based on it, which conflicts with the fact that K-VQA questions often require the combination of multiple question-answering strategies. In light of this, we propose Rationale-based Ensemble of Answer Context Tactics (REACT) to achieve a dynamic ensemble of multiple question-answering tactics, comprising Answer Candidate Generation (ACG) and Rationale-based Strategy Fusion (RSF). In ACG, we generate three distinctive decision contexts to provide different strategies for each question, resulting in the generation of three answer candidates. RSF generates automatic and mechanistic rationales from decision contexts for each candidate, allowing the model to select the correct answer from all candidates. We conduct comprehensive experiments on the OK-VQA and A-OKVQA datasets, and our method significantly outperforms state-of-the-art LLM-based baselines on all datasets.
Related papers
- Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering [45.88079503965459]
Several methods have been proposed to augment large Vision Language Models (VLMs) for Visual Question Answering (VQA) simplicity.
We introduce a novel, explainable multi-agent collaboration framework designed to imitate human-like top-down reasoning.
arXiv Detail & Related papers (2023-11-29T03:10:42Z) - Improving Zero-shot Visual Question Answering via Large Language Models
with Reasoning Question Prompts [22.669502403623166]
We present Reasoning Question Prompts for VQA tasks, which can further activate the potential of Large Language Models.
We generate self-contained questions as reasoning question prompts via an unsupervised question edition module.
Each reasoning question prompt clearly indicates the intent of the original question.
Then, the candidate answers associated with their confidence scores acting as answer integritys are fed into LLMs.
arXiv Detail & Related papers (2023-11-15T15:40:46Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Consecutive Question Generation via Dynamic Multitask Learning [17.264399861776187]
We propose the task of consecutive question generation (CQG), which generates a set of logically related question-answer pairs to understand a whole passage.
We first examine the four key elements of CQG, and propose a novel dynamic multitask framework with one main task generating a question-answer pair, and four auxiliary tasks generating other elements.
We prove that our strategy can improve question generation significantly and benefit multiple related NLP tasks.
arXiv Detail & Related papers (2022-11-16T11:50:36Z) - PACIFIC: Towards Proactive Conversational Question Answering over
Tabular and Textual Data in Finance [96.06505049126345]
We present a new dataset, named PACIFIC. Compared with existing CQA datasets, PACIFIC exhibits three key features: (i) proactivity, (ii) numerical reasoning, and (iii) hybrid context of tables and text.
A new task is defined accordingly to study Proactive Conversational Question Answering (PCQA), which combines clarification question generation and CQA.
UniPCQA performs multi-task learning over all sub-tasks in PCQA and incorporates a simple ensemble strategy to alleviate the error propagation issue in the multi-task learning by cross-validating top-$k$ sampled Seq2Seq
arXiv Detail & Related papers (2022-10-17T08:06:56Z) - VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
for Visual Question Answering [79.22069768972207]
We propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations.
Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context.
On two challenging VQA tasks, our method outperforms strong baseline VQA methods by 3.2% on VCR and 4.6% on GQA, suggesting its strength in performing concept-level reasoning.
arXiv Detail & Related papers (2022-05-23T17:55:34Z) - Adaptive Information Seeking for Open-Domain Question Answering [61.39330982757494]
We propose a novel adaptive information-seeking strategy for open-domain question answering, namely AISO.
According to the learned policy, AISO could adaptively select a proper retrieval action to seek the missing evidence at each step.
AISO outperforms all baseline methods with predefined strategies in terms of both retrieval and answer evaluations.
arXiv Detail & Related papers (2021-09-14T15:08:13Z) - Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
Reasoning Strategies [78.68534915690404]
StrategyQA is a benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy.
We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts.
Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
arXiv Detail & Related papers (2021-01-06T19:14:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.