Multimodal Multihop Source Retrieval for Web Question Answering
- URL: http://arxiv.org/abs/2501.04173v1
- Date: Tue, 07 Jan 2025 22:53:56 GMT
- Title: Multimodal Multihop Source Retrieval for Web Question Answering
- Authors: Navya Yarrabelly, Saloni Mittal,
- Abstract summary: This work deals with the challenge of learning and reasoning over multi-modal multi-hop question answering (QA)
We propose a graph reasoning network based on the semantic structure of the sentences to learn multi-source reasoning paths.
- Score: 0.0
- License:
- Abstract: This work deals with the challenge of learning and reasoning over multi-modal multi-hop question answering (QA). We propose a graph reasoning network based on the semantic structure of the sentences to learn multi-source reasoning paths and find the supporting facts across both image and text modalities for answering the question. In this paper, we investigate the importance of graph structure for multi-modal multi-hop question answering. Our analysis is centered on WebQA. We construct a strong baseline model, that finds relevant sources using a pairwise classification task. We establish that, with the proper use of feature representations from pre-trained models, graph structure helps in improving multi-modal multi-hop question answering. We point out that both graph structure and adjacency matrix are task-related prior knowledge, and graph structure can be leveraged to improve the retrieval performance for the task. Experiments and visualized analysis demonstrate that message propagation over graph networks or the entire graph structure can replace massive multimodal transformers with token-wise cross-attention. We demonstrated the applicability of our method and show a performance gain of \textbf{4.6$\%$} retrieval F1score over the transformer baselines, despite being a very light model. We further demonstrated the applicability of our model to a large scale retrieval setting.
Related papers
- Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning [73.2950349728376]
Large language models (LLMs) have demonstrated remarkable success across a wide range of tasks.
However, they still encounter challenges in reasoning tasks that require understanding and inferring relationships between pieces of information.
This challenge is particularly pronounced in tasks involving multi-step processes, such as logical reasoning and multi-hop question answering.
We propose Reasoning with Graphs (RwG) by first constructing explicit graphs from the context.
arXiv Detail & Related papers (2025-01-14T05:18:20Z) - Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering [58.17090503446995]
We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes.
Our method utilizes a graph structured representation to aggregate information about a question and its context.
arXiv Detail & Related papers (2024-06-14T13:28:03Z) - Multimodal Graph Transformer for Multimodal Question Answering [9.292566397511763]
We propose a novel Multimodal Graph Transformer for question answering tasks that requires performing reasoning across multiple modalities.
We introduce a graph-involved plug-and-play quasi-attention mechanism to incorporate multimodal graph information.
We validate the effectiveness of Multimodal Graph Transformer over its Transformer baselines on GQA, VQAv2, and MultiModalQA datasets.
arXiv Detail & Related papers (2023-04-30T21:22:35Z) - Graph Attention with Hierarchies for Multi-hop Question Answering [19.398300844233837]
We present two extensions to the SOTA Graph Neural Network (GNN) based model for HotpotQA.
Experiments on HotpotQA demonstrate the efficiency of the proposed modifications.
arXiv Detail & Related papers (2023-01-27T15:49:50Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Relational Graph Convolutional Neural Networks for Multihop Reasoning: A
Comparative Study [22.398477810999818]
Multihop Question Answering is a complex task that requires steps of reasoning to find the correct answer.
In this paper we explore a number of RGCN-based Multihop QA models, graph relations, and node embeddings, and empirically explore the influence each on Multihop QA performance on the WikiHop dataset.
arXiv Detail & Related papers (2022-10-12T17:13:30Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - Dynamic Semantic Graph Construction and Reasoning for Explainable
Multi-hop Science Question Answering [50.546622625151926]
We propose a new framework to exploit more valid facts while obtaining explainability for multi-hop QA.
Our framework contains three new ideas: (a) tt AMR-SG, an AMR-based Semantic Graph, constructed by candidate fact AMRs to uncover any hop relations among question, answer and multiple facts, (b) a novel path-based fact analytics approach exploiting tt AMR-SG to extract active facts from a large fact pool to answer questions, and (c) a fact-level relation modeling leveraging graph convolution network (GCN) to guide the reasoning process.
arXiv Detail & Related papers (2021-05-25T09:14:55Z) - Is Graph Structure Necessary for Multi-hop Question Answering? [34.189355591677725]
We investigate whether the graph structure is necessary for multi-hop question answering.
Experiments and visualized analysis demonstrate that graph-attention or the entire graph structure can be replaced by self-attention or Transformers.
arXiv Detail & Related papers (2020-04-07T02:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.