LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization
- URL: http://arxiv.org/abs/2406.12494v2
- Date: Thu, 17 Oct 2024 08:09:37 GMT
- Title: LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization
- Authors: Masafumi Enomoto, Kunihiro Takeoka, Kosuke Akimoto, Kiril Gashteovski, Masafumi Oyamada,
- Abstract summary: Open-Domain Multi-Document Summarization (ODMDS) is the task of generating summaries from large document collections in response to user queries.
Traditional retrieve-then-summarize approaches fall short for open-ended queries in ODMDS tasks.
We propose LightPAL, a lightweight passage retrieval method for ODMDS.
- Score: 9.739781953744606
- License:
- Abstract: Open-Domain Multi-Document Summarization (ODMDS) is the task of generating summaries from large document collections in response to user queries. This task is crucial for efficiently addressing diverse information needs from users. Traditional retrieve-then-summarize approaches fall short for open-ended queries in ODMDS tasks. These queries often require broader context than initially retrieved passages provide, making it challenging to retrieve all relevant information in a single search. While iterative retrieval methods has been explored for multi-hop question answering (MQA), it's impractical for ODMDS due to high latency from repeated LLM inference. Accordingly, we propose LightPAL, a lightweight passage retrieval method for ODMDS. LightPAL leverages an LLM to pre-construct a graph representing passage relationships, then employs random walk during retrieval, avoiding iterative LLM inference. Experiments demonstrate that LightPAL outperforms naive sparse and pre-trained dense retrievers in both retrieval and summarization metrics, while achieving higher efficiency compared to iterative MQA approaches.
Related papers
- Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs [23.960451986662996]
This paper proposes a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought reasoning.
We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of distractor text.
arXiv Detail & Related papers (2025-02-18T02:49:40Z) - MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)
We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.
We propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism [2.919891871101241]
Transformers have a quadratic scaling of computational complexity with input size.
Retrieval-augmented generation (RAG) can better handle longer contexts by using a retrieval system.
We introduce a novel approach, Inner Loop Memory Augmented Tree Retrieval (ILM-TR)
arXiv Detail & Related papers (2024-10-11T19:49:05Z) - EfficientRAG: Efficient Retriever for Multi-Hop Question Answering [52.64500643247252]
We introduce EfficientRAG, an efficient retriever for multi-hop question answering.
Experimental results demonstrate that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.
arXiv Detail & Related papers (2024-08-08T06:57:49Z) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus.
The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z) - ODSum: New Benchmarks for Open Domain Multi-Document Summarization [30.875191848268347]
Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries.
We propose a rule-based method to process query-based document summarization datasets into ODMDS datasets.
arXiv Detail & Related papers (2023-09-16T11:27:34Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios.
Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z) - Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs)
query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets.
Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.