Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA
- URL: http://arxiv.org/abs/2406.12746v3
- Date: Sun, 23 Jun 2024 03:06:42 GMT
- Title: Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA
- Authors: Miaoyu Li, Haoxin Li, Zilin Du, Boyang Li,
- Abstract summary: Knowledge-based Visual Qustion-answering (K-VQA) requires the use of background knowledge beyond what is depicted in the image.
Current zero-shot K-VQA methods usually translate an image to a single type of textual decision context and use a text-based model to answer the question based on it.
We propose Rationale-based Ensemble of Answer Context Tactics (REACT) to achieve a dynamic ensemble of multiple question-answering tactics.
- Score: 8.498145119681437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-based Visual Qustion-answering (K-VQA) necessitates the use of background knowledge beyond what is depicted in the image. Current zero-shot K-VQA methods usually translate an image to a single type of textual decision context and use a text-based model to answer the question based on it, which conflicts with the fact that K-VQA questions often require the combination of multiple question-answering strategies. In light of this, we propose Rationale-based Ensemble of Answer Context Tactics (REACT) to achieve a dynamic ensemble of multiple question-answering tactics, comprising Answer Candidate Generation (ACG) and Rationale-based Strategy Fusion (RSF). In ACG, we generate three distinctive decision contexts to provide different strategies for each question, resulting in the generation of three answer candidates. RSF generates automatic and mechanistic rationales from decision contexts for each candidate, allowing the model to select the correct answer from all candidates. We conduct comprehensive experiments on the OK-VQA and A-OKVQA datasets, and our method significantly outperforms state-of-the-art LLM-based baselines on all datasets.
Related papers
- Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - Causal Reasoning through Two Layers of Cognition for Improving
Generalization in Visual Question Answering [28.071906755200043]
Generalization in Visual Question Answering (VQA) requires models to answer questions about images with contexts beyond the training distribution.
We propose Cognitive pathways VQA (CopVQA) improving the multimodal predictions by emphasizing causal reasoning factors.
CopVQA achieves a new state-of-the-art (SOTA) on PathVQA dataset and comparable accuracy to the current SOTA on VQA-CPv2, VQAv2, and VQA RAD, with one-fourth of the model size.
arXiv Detail & Related papers (2023-10-09T05:07:58Z) - Reasoning over Hierarchical Question Decomposition Tree for Explainable
Question Answering [83.74210749046551]
We propose to leverage question decomposing for heterogeneous knowledge integration.
We propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT)
Experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly.
arXiv Detail & Related papers (2023-05-24T11:45:59Z) - Chain-of-Knowledge: Grounding Large Language Models via Dynamic
Knowledge Adapting over Heterogeneous Sources [87.26486246513063]
Chain-of-knowledge (CoK) is a framework that augments large language models.
CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation.
arXiv Detail & Related papers (2023-05-22T17:34:23Z) - Query Enhanced Knowledge-Intensive Conversation via Unsupervised Joint
Modeling [35.27735234588822]
We propose an unsupervised query enhanced approach for knowledge-intensive conversations, namely QKConv.
QKConv is optimized through joint training, which produces the response by exploring multiple candidate queries and leveraging corresponding selected knowledge.
arXiv Detail & Related papers (2022-12-19T16:21:05Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - MuKEA: Multimodal Knowledge Extraction and Accumulation for
Knowledge-based Visual Question Answering [23.628740943735167]
We propose MuKEA to represent multimodal knowledge by an explicit triplet to correlate visual objects and fact answers with implicit relations.
By adopting a pre-training and fine-tuning learning strategy, both basic and domain-specific multimodal knowledge are progressively accumulated for answer prediction.
arXiv Detail & Related papers (2022-03-17T07:42:14Z) - ArT: All-round Thinker for Unsupervised Commonsense Question-Answering [54.068032948300655]
We propose an approach of All-round Thinker (ArT) by fully taking association during knowledge generating.
We evaluate it on three commonsense QA benchmarks: COPA, SocialIQA and SCT.
arXiv Detail & Related papers (2021-12-26T18:06:44Z) - Differentiable Open-Ended Commonsense Reasoning [80.94997942571838]
We study open-ended commonsense reasoning (OpenCSR) using as a resource only a corpus of commonsense facts written in natural language.
As an approach to OpenCSR, we propose DrFact, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts.
arXiv Detail & Related papers (2020-10-24T10:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.