Related papers: LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries

LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries

URL: http://arxiv.org/abs/2505.12694v1
Date: Mon, 19 May 2025 04:33:09 GMT
Title: LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries
Authors: Kenya Abe, Kunihiro Takeoka, Makoto P. Kato, Masafumi Oyamada,
Abstract summary: Large language models (LLMs) offer an effective alternative to traditional rule-based and statistical methods.<n>Large language models (LLMs) offer an effective alternative to traditional rule-based and statistical methods.
Score: 5.561044064438963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Query expansion (QE) enhances retrieval by incorporating relevant terms, with large language models (LLMs) offering an effective alternative to traditional rule-based and statistical methods. However, LLM-based QE suffers from a fundamental limitation: it often fails to generate relevant knowledge, degrading search performance. Prior studies have focused on hallucination, yet its underlying cause--LLM knowledge deficiencies--remains underexplored. This paper systematically examines two failure cases in LLM-based QE: (1) when the LLM lacks query knowledge, leading to incorrect expansions, and (2) when the query is ambiguous, causing biased refinements that narrow search coverage. We conduct controlled experiments across multiple datasets, evaluating the effects of knowledge and query ambiguity on retrieval performance using sparse and dense retrieval models. Our results reveal that LLM-based QE can significantly degrade the retrieval effectiveness when knowledge in the LLM is insufficient or query ambiguity is high. We introduce a framework for evaluating QE under these conditions, providing insights into the limitations of LLM-based retrieval augmentation.

Related papers

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation [110.610512800947]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge.<n>Existing studies often treat utility as a generic attribute, ignoring the fact that different LLMs may benefit differently from the same passage.
arXiv Detail & Related papers (2025-10-13T12:57:45Z)
RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMs [18.947344953344995]
Retrieval-Judgment-Exploration (RJE) is a framework that retrieves refined reasoning paths, evaluates their sufficiency, and conditionally explores additional evidence.<n> RJE substantially reduces the number of LLM calls and token usage compared to agent-based methods, yielding significant efficiency improvements.
arXiv Detail & Related papers (2025-09-25T03:56:18Z)
Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling [1.219841051166348]
In this paper, we explore the combined potential of in-context search and test-time scaling on super hard reasoning tasks.<n>We find that by employing advanced in-context search prompting to LLMs augmented with internal scaling, one can achieve transformative performance breakthroughs.
arXiv Detail & Related papers (2025-05-28T12:28:18Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.<n>We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs [27.362012903540492]
The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning. The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning.
arXiv Detail & Related papers (2024-04-09T14:40:08Z)
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model. We employ a proxy model which has far fewer parameters, and take its answers as answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z)
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process. Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility [37.682136465784254]
We conduct over a million queries to the mainstream large language models (LLMs) including ChatGPT, LLaMA, and OPT. We find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. We propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation.
arXiv Detail & Related papers (2023-05-15T15:44:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.