Related papers: Corpus-Steered Query Expansion with Large Language Models

Corpus-Steered Query Expansion with Large Language Models

URL: http://arxiv.org/abs/2402.18031v1
Date: Wed, 28 Feb 2024 03:58:58 GMT
Title: Corpus-Steered Query Expansion with Large Language Models
Authors: Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, Andrew Yates
Abstract summary: We introduce Corpus-Steered Query Expansion (CSQE) to promote the incorporation of knowledge embedded within the corpus. CSQE utilizes the relevance assessing capability of LLMs to systematically identify pivotal sentences in the initially-retrieved documents. Extensive experiments reveal that CSQE exhibits strong performance without necessitating any training, especially with queries for which LLMs lack knowledge.
Score: 35.64662397095323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies demonstrate that query expansions generated by large language models (LLMs) can considerably enhance information retrieval systems by generating hypothetical documents that answer the queries as expansions. However, challenges arise from misalignments between the expansions and the retrieval corpus, resulting in issues like hallucinations and outdated information due to the limited intrinsic knowledge of LLMs. Inspired by Pseudo Relevance Feedback (PRF), we introduce Corpus-Steered Query Expansion (CSQE) to promote the incorporation of knowledge embedded within the corpus. CSQE utilizes the relevance assessing capability of LLMs to systematically identify pivotal sentences in the initially-retrieved documents. These corpus-originated texts are subsequently used to expand the query together with LLM-knowledge empowered expansions, improving the relevance prediction between the query and the target documents. Extensive experiments reveal that CSQE exhibits strong performance without necessitating any training, especially with queries for which LLMs lack knowledge.

Related papers

LLM Inference Enhanced by External Knowledge: A Survey [16.319049759753106]
This study explores strategies for using external knowledge to enhance large language models (LLMs)<n>Our comparative analysis highlights the trade-offs among interpretability, scalability, and performance.
arXiv Detail & Related papers (2025-05-30T09:08:51Z)
KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing [64.38243807002878]
We present KnowTrace, an elegant RAG framework to mitigate the context overload in large language models.<n>KnowTrace autonomously traces out desired knowledge triplets to organize a specific knowledge graph relevant to the input question.<n>It consistently surpasses existing methods across three multi-hop question answering benchmarks.
arXiv Detail & Related papers (2025-05-26T17:22:20Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries [5.561044064438963]
Large language models (LLMs) offer an effective alternative to traditional rule-based and statistical methods.<n>Large language models (LLMs) offer an effective alternative to traditional rule-based and statistical methods.
arXiv Detail & Related papers (2025-05-19T04:33:09Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG) We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z)
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization [59.06663981902496]
Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. We investigate two indispensable characteristics that the LLMs-based QFS models should be harnessed, Lengthy Document Summarization and Efficiently Fine-grained Query-LLM Alignment. These innovations pave the way for broader application and accessibility in the field of QFS technology.
arXiv Detail & Related papers (2024-07-15T07:14:56Z)
Redefining Information Retrieval of Structured Database via Large Language Models [10.117751707641416]
This paper introduces a novel retrieval augmentation framework called ChatLR. It primarily employs the powerful semantic understanding ability of Large Language Models (LLMs) as retrievers to achieve precise and concise information retrieval. Experimental results demonstrate the effectiveness of ChatLR in addressing user queries, achieving an overall information retrieval accuracy exceeding 98.8%.
arXiv Detail & Related papers (2024-05-09T02:37:53Z)
LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation [6.676337039829463]
We propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES) We leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval. Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method.
arXiv Detail & Related papers (2024-04-22T09:56:59Z)
Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models [37.02290559379761]
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM.
arXiv Detail & Related papers (2024-04-04T15:21:22Z)
Quantitative knowledge retrieval from large language models [4.155711233354597]
Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences. This paper explores the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid data analysis tasks.
arXiv Detail & Related papers (2024-02-12T16:32:37Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval [28.906829093158592]
We study the potential of pre-training with Large Language Model(LLM)-based document expansion for dense passage retrieval. Our work shows strong zero-shot and out-of-domain retrieval abilities, making it more widely applicable for retrieval when initializing with no human-labeled data.
arXiv Detail & Related papers (2023-08-16T11:10:43Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
Synergistic Interplay between Search and Large Language Models for Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections. InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.