Enhancing Multi-modal and Multi-hop Question Answering via Structured
Knowledge and Unified Retrieval-Generation
- URL: http://arxiv.org/abs/2212.08632v2
- Date: Mon, 7 Aug 2023 03:02:06 GMT
- Title: Enhancing Multi-modal and Multi-hop Question Answering via Structured
Knowledge and Unified Retrieval-Generation
- Authors: Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang
- Abstract summary: Multi-modal multi-hop question answering involves answering a question by reasoning over multiple input sources from different modalities.
Existing methods often retrieve evidences separately and then use a language model to generate an answer based on the retrieved evidences.
We propose a Structured Knowledge and Unified Retrieval-Generation (RG) approach to address these issues.
- Score: 33.56304858796142
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modal multi-hop question answering involves answering a question by
reasoning over multiple input sources from different modalities. Existing
methods often retrieve evidences separately and then use a language model to
generate an answer based on the retrieved evidences, and thus do not adequately
connect candidates and are unable to model the interdependent relations during
retrieval. Moreover, the pipelined approaches of retrieval and generation might
result in poor generation performance when retrieval performance is low. To
address these issues, we propose a Structured Knowledge and Unified
Retrieval-Generation (SKURG) approach. SKURG employs an Entity-centered Fusion
Encoder to align sources from different modalities using shared entities. It
then uses a unified Retrieval-Generation Decoder to integrate intermediate
retrieval results for answer generation and also adaptively determine the
number of retrieval steps. Extensive experiments on two representative
multi-modal multi-hop QA datasets MultimodalQA and WebQA demonstrate that SKURG
outperforms the state-of-the-art models in both source retrieval and answer
generation performance with fewer parameters. Our code is available at
https://github.com/HITsz-TMG/SKURG.
Related papers
- IRLab@iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search [6.974395116689502]
iKAT 2024 focuses on advancing conversational assistants, able to adapt their interaction and responses from personalized user knowledge.
The track incorporates a Personal Textual Knowledge Base (PTKB) alongside Conversational AI tasks, such as passage ranking and response generation.
arXiv Detail & Related papers (2024-11-22T05:18:35Z) - CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.
Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.
Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach [6.549143816134531]
We propose a novel iterative RAG method called ReSP, equipped with a dual-function summarizer.
Experimental results on the multi-hop question-answering HotpotQA and 2WikiMultihopQA demonstrate that our method significantly outperforms the state-of-the-art.
arXiv Detail & Related papers (2024-07-18T02:19:00Z) - From RAG to RICHES: Retrieval Interlaced with Sequence Generation [3.859418700143553]
We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks.
It retrieves documents by directly decoding their contents, constrained on the corpus.
We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.
arXiv Detail & Related papers (2024-06-29T08:16:58Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - Enhancing Retrieval-Augmented Large Language Models with Iterative
Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge.
Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.