Related papers: Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

URL: http://arxiv.org/abs/2602.10321v1
Date: Tue, 10 Feb 2026 21:59:10 GMT
Title: Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval
Authors: Debayan Mukhopadhyay, Utshab Kumar Ghosh, Shubham Chatterjee,
Abstract summary: We propose using a single call to a generic 8B- parameter LLM for query reformulation.<n>Our method is particularly effective where standard Pseudo-Relevance Feedback fails due to poor initial recall.<n> Experiments on 2025 TREC-ToT datasets show that while raw queries yield poor performance, our lightweight pre-retrieval transformation improves Recall by 20.61%.
Score: 3.976291254896486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieving known items from vague descriptions, Tip-of-the-Tongue (ToT) retrieval, remains a significant challenge. We propose using a single call to a generic 8B-parameter LLM for query reformulation, bridging the gap between ill-formed ToT queries and specific information needs. This method is particularly effective where standard Pseudo-Relevance Feedback fails due to poor initial recall. Crucially, our LLM is not fine-tuned for ToT or specific domains, demonstrating that gains stem from our prompting strategy rather than model specialization. Rewritten queries feed a multi-stage pipeline: sparse retrieval (BM25), dense/late-interaction reranking (Contriever, E5-large-v2, ColBERTv2), monoT5 cross-encoding, and list-wise reranking (Qwen 2.5 72B). Experiments on 2025 TREC-ToT datasets show that while raw queries yield poor performance, our lightweight pre-retrieval transformation improves Recall by 20.61%. Subsequent reranking improves nDCG@10 by 33.88%, MRR by 29.92%, and MAP@10 by 29.98%, offering a cost-effective intervention that unlocks the potential of downstream rankers. Code and data: https://github.com/debayan1405/TREC-TOT-2025

Related papers

DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking [0.5352699766206809]
We develop a two-stage retrieval system to address the TREC Tip-of-the-Tongue (ToT) task.<n>In the first stage, we employ hybrid retrieval that merges LLM-based retrieval, sparse (BM25), and dense (BGE-M3) retrieval methods.<n>We also introduce topic-aware multi-index dense retrieval that partitions the Wikipedia corpus into 24 topical domains.
arXiv Detail & Related papers (2026-01-21T23:09:17Z)
Revisiting Feedback Models for HyDE [49.53124785319461]
HyDE is a method that enriches query representations with LLM-generated hypothetical answer documents.<n>Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms.
arXiv Detail & Related papers (2025-11-24T17:50:18Z)
InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking [3.1125398490785217]
InsertRank is an LLM-based reranker that leverages lexical signals like BM25 scores during reranking to further improve retrieval performance.<n>With Deepseek-R1, InsertRank achieves a score of 37.5 on the BRIGHT benchmark, and 51.1 on the R2MED benchmark, surpassing previous methods.
arXiv Detail & Related papers (2025-06-17T01:04:45Z)
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task.<n>Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z)
Guiding Retrieval using LLM-based Listwise Rankers [15.3583908068962]
We propose an adaptation of an existing adaptive retrieval method that supports the listwise setting.<n>Specifically, our proposed algorithm merges results both from the initial ranking and feedback documents.<n>We demonstrate that our method can improve nDCG@10 by up to 13.23% and recall by 28.02%--all while keeping the total number of LLM inferences constant and overheads due to the adaptive process minimal.
arXiv Detail & Related papers (2025-01-15T22:23:53Z)
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token. We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains. Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries. We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT) LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z)
Ranked List Truncation for Large Language Model-based Re-Ranking [53.97064615557883]
We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank" perspective, where we optimize re-ranking by truncating the retrieved list. RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker. We reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking.
arXiv Detail & Related papers (2024-04-28T13:39:33Z)
Sequencing Matters: A Generate-Retrieve-Generate Model for Building Conversational Agents [9.191944519634111]
The Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023. Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate. Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again.
arXiv Detail & Related papers (2023-11-16T02:37:58Z)
Mixed-initiative Query Rewriting in Conversational Passage Retrieval [11.644235288057123]
We report our methods and experiments for the TREC Conversational Assistance Track (CAsT) 2022. We propose a mixed-initiative query rewriting module, which achieves query rewriting based on the mixed-initiative interaction between the users and the system. Experiments on both TREC CAsT 2021 and TREC CAsT 2022 datasets show the effectiveness of our mixed-initiative-based query rewriting (or query reformulation) method.
arXiv Detail & Related papers (2023-07-17T19:38:40Z)
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting [65.00288634420812]
Pairwise Ranking Prompting (PRP) is a technique to significantly reduce the burden on Large Language Models (LLMs) Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs.
arXiv Detail & Related papers (2023-06-30T11:32:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.