Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion
- URL: http://arxiv.org/abs/2506.04760v1
- Date: Thu, 05 Jun 2025 08:44:34 GMT
- Title: Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion
- Authors: Lingyuan Liu, Mengxiang Zhang,
- Abstract summary: Large Language Models (LLMs) have shown potential in generating hypothetical documents for query expansion.<n>We introduce a novel fusion ranking framework, Exp4Fuse, which enhances the performance of sparse retrievers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown potential in generating hypothetical documents for query expansion, thereby enhancing information retrieval performance. However, the efficacy of this method is highly dependent on the quality of the generated documents, which often requires complex prompt strategies and the integration of advanced dense retrieval techniques. This can be both costly and computationally intensive. To mitigate these limitations, we explore the use of zero-shot LLM-based query expansion to improve sparse retrieval, particularly for learned sparse retrievers. We introduce a novel fusion ranking framework, Exp4Fuse, which enhances the performance of sparse retrievers through an indirect application of zero-shot LLM-based query expansion. Exp4Fuse operates by simultaneously considering two retrieval routes-one based on the original query and the other on the LLM-augmented query. It then generates two ranked lists using a sparse retriever and fuses them using a modified reciprocal rank fusion method. We conduct extensive evaluations of Exp4Fuse against leading LLM-based query expansion methods and advanced retrieval techniques on three MS MARCO-related datasets and seven low-resource datasets. Experimental results reveal that Exp4Fuse not only surpasses existing LLM-based query expansion methods in enhancing sparse retrievers but also, when combined with advanced sparse retrievers, achieves SOTA results on several benchmarks. This highlights the superior performance and effectiveness of Exp4Fuse in improving query expansion for sparse retrieval.
Related papers
- Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment [4.21943400140261]
Aligned Query Expansion (AQE) is a novel approach to enhance query expansion for passage retrieval in open-domain question answering.<n>We show that AQE outperforms baseline models for query expansion in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2025-07-15T07:11:29Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers [24.01783076521377]
Retrieval-Augmented Generation (RAG) is a crucial method for mitigating hallucinations in Large Language Models (LLMs)<n>Existing RAG methods typically employ query rewriting to clarify the user intent and manage multi-hop logic, while using hybrid retrieval to expand search scope.<n>We introduce a high-level searcher that decomposes complex queries into atomic queries, independent of any retriever-specific optimizations.<n>To harness the strengths of sparse retrievers for precise keyword retrieval, we have developed a new sparse searcher that employs Lucene syntax to enhance retrieval accuracy.
arXiv Detail & Related papers (2025-02-25T12:09:16Z) - ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance [21.777817032607405]
Large language models (LLMs) have demonstrated significant potential in enhancing dense retrieval through query augmentation.<n>In this work, we propose ExpandR, a unified LLM-augmented dense retrieval framework.<n> Experimental results on multiple benchmarks show that ExpandR consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-02-24T11:15:41Z) - Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method [48.14236175156835]
ARM aims to better align the question with the organization of the data collection by exploring relationships among data objects.<n>It outperforms standard RAG with query decomposition by up to 5.2 pt in execution accuracy and agentic RAG (ReAct) by up to 15.9 pt.<n>It achieves up to 5.5 pt and 19.3 pt higher F1 match scores compared to these approaches.
arXiv Detail & Related papers (2025-01-30T18:07:19Z) - MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs [78.5013630951288]
This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs)<n>We first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks.<n>Our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR.
arXiv Detail & Related papers (2024-11-04T20:06:34Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios.
Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z) - Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs)
query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets.
Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.