II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
- URL: http://arxiv.org/abs/2402.11058v3
- Date: Mon, 3 Jun 2024 01:09:38 GMT
- Title: II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
- Authors: Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim,
- Abstract summary: We propose II-MMR, a novel idea to identify and improve multi-modal multi-hop reasoning in Visual Question Answering (VQA)
II-MMR takes a VQA question with an image and finds a reasoning path to reach its answer using two novel language promptings.
II-MMR shows its effectiveness across all reasoning cases in both zero-shot and fine-tuning settings.
- Score: 15.65067042725113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Answering (VQA) often involves diverse reasoning scenarios across Vision and Language (V&L). Most prior VQA studies, however, have merely focused on assessing the model's overall accuracy without evaluating it on different reasoning cases. Furthermore, some recent works observe that conventional Chain-of-Thought (CoT) prompting fails to generate effective reasoning for VQA, especially for complex scenarios requiring multi-hop reasoning. In this paper, we propose II-MMR, a novel idea to identify and improve multi-modal multi-hop reasoning in VQA. In specific, II-MMR takes a VQA question with an image and finds a reasoning path to reach its answer using two novel language promptings: (i) answer prediction-guided CoT prompt, or (ii) knowledge triplet-guided prompt. II-MMR then analyzes this path to identify different reasoning cases in current VQA benchmarks by estimating how many hops and what types (i.e., visual or beyond-visual) of reasoning are required to answer the question. On popular benchmarks including GQA and A-OKVQA, II-MMR observes that most of their VQA questions are easy to answer, simply demanding "single-hop" reasoning, whereas only a few questions require "multi-hop" reasoning. Moreover, while the recent V&L model struggles with such complex multi-hop reasoning questions even using the traditional CoT method, II-MMR shows its effectiveness across all reasoning cases in both zero-shot and fine-tuning settings.
Related papers
- GenDec: A robust generative Question-decomposition method for Multi-hop
reasoning [32.12904215053187]
Multi-hop QA involves step-by-step reasoning to answer complex questions.
Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration.
It is unclear whether LLMs follow a desired reasoning chain to reach the right final answer.
arXiv Detail & Related papers (2024-02-17T02:21:44Z) - Causal Reasoning through Two Layers of Cognition for Improving
Generalization in Visual Question Answering [28.071906755200043]
Generalization in Visual Question Answering (VQA) requires models to answer questions about images with contexts beyond the training distribution.
We propose Cognitive pathways VQA (CopVQA) improving the multimodal predictions by emphasizing causal reasoning factors.
CopVQA achieves a new state-of-the-art (SOTA) on PathVQA dataset and comparable accuracy to the current SOTA on VQA-CPv2, VQAv2, and VQA RAD, with one-fourth of the model size.
arXiv Detail & Related papers (2023-10-09T05:07:58Z) - HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale
Supervision [118.0818807474809]
This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision.
Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document.
arXiv Detail & Related papers (2023-05-23T16:53:49Z) - Answering Questions by Meta-Reasoning over Multiple Chains of Thought [53.55653437903948]
We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought.
MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer.
arXiv Detail & Related papers (2023-04-25T17:27:37Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Prompt-based Conservation Learning for Multi-hop Question Answering [11.516763652013005]
Multi-hop question answering requires reasoning over multiple documents to answer a complex question.
Most existing multi-hop QA methods fail to answer a large fraction of sub-questions.
We propose the Prompt-based Conservation Learning framework for multi-hop QA.
arXiv Detail & Related papers (2022-09-14T20:50:46Z) - Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question
Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question.
Existing methods usually decompose the multi-hop question into simpler single-hop questions.
We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z) - Interpretable AMR-Based Question Decomposition for Multi-hop Question
Answering [12.35571328854374]
We propose a Question Decomposition method based on Abstract Meaning Representation (QDAMR) for multi-hop QA.
We decompose a multi-hop question into simpler sub-questions and answer them in order.
Experimental results on HotpotQA demonstrate that our approach is competitive for interpretable reasoning.
arXiv Detail & Related papers (2022-06-16T23:46:33Z) - Ask to Understand: Question Generation for Multi-hop Question Answering [11.626390908264872]
Multi-hop Question Answering (QA) requires the machine to answer complex questions by finding scattering clues and reasoning from multiple documents.
We propose a novel method to complete multi-hop QA from the perspective of Question Generation (QG)
arXiv Detail & Related papers (2022-03-17T04:02:29Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.