Enhancing Multi-modal and Multi-hop Question Answering via Structured
Knowledge and Unified Retrieval-Generation
- URL: http://arxiv.org/abs/2212.08632v2
- Date: Mon, 7 Aug 2023 03:02:06 GMT
- Title: Enhancing Multi-modal and Multi-hop Question Answering via Structured
Knowledge and Unified Retrieval-Generation
- Authors: Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang
- Abstract summary: Multi-modal multi-hop question answering involves answering a question by reasoning over multiple input sources from different modalities.
Existing methods often retrieve evidences separately and then use a language model to generate an answer based on the retrieved evidences.
We propose a Structured Knowledge and Unified Retrieval-Generation (RG) approach to address these issues.
- Score: 33.56304858796142
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modal multi-hop question answering involves answering a question by
reasoning over multiple input sources from different modalities. Existing
methods often retrieve evidences separately and then use a language model to
generate an answer based on the retrieved evidences, and thus do not adequately
connect candidates and are unable to model the interdependent relations during
retrieval. Moreover, the pipelined approaches of retrieval and generation might
result in poor generation performance when retrieval performance is low. To
address these issues, we propose a Structured Knowledge and Unified
Retrieval-Generation (SKURG) approach. SKURG employs an Entity-centered Fusion
Encoder to align sources from different modalities using shared entities. It
then uses a unified Retrieval-Generation Decoder to integrate intermediate
retrieval results for answer generation and also adaptively determine the
number of retrieval steps. Extensive experiments on two representative
multi-modal multi-hop QA datasets MultimodalQA and WebQA demonstrate that SKURG
outperforms the state-of-the-art models in both source retrieval and answer
generation performance with fewer parameters. Our code is available at
https://github.com/HITsz-TMG/SKURG.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices [91.71951459594074]
Long language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios.
Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement.
We propose the Multi-agent Interactive Multi-hop Generation framework, incorporating a Quality Verification Agent, a Single-hop Question Generation Agent, a Multiple Question Sampling Strategy, and a Multi-hop Question Merger Agent.
Our findings show that our synthetic high-quality long-context instruction data significantly enhances model performance, even surpassing models trained on larger amounts of human
arXiv Detail & Related papers (2024-09-03T13:30:00Z) - Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering [24.71247954169364]
Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions.
Existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off.
We propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules.
arXiv Detail & Related papers (2024-08-20T09:29:31Z) - Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach [6.549143816134531]
We propose a novel iterative RAG method called ReSP, equipped with a dual-function summarizer.
Experimental results on the multi-hop question-answering HotpotQA and 2WikiMultihopQA demonstrate that our method significantly outperforms the state-of-the-art.
arXiv Detail & Related papers (2024-07-18T02:19:00Z) - From RAG to RICHES: Retrieval Interlaced with Sequence Generation [3.859418700143553]
We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks.
It retrieves documents by directly decoding their contents, constrained on the corpus.
We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.
arXiv Detail & Related papers (2024-06-29T08:16:58Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - Enhancing Retrieval-Augmented Large Language Models with Iterative
Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge.
Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.