Can Instructed Retrieval Models Really Support Exploration?
- URL: http://arxiv.org/abs/2601.10936v1
- Date: Fri, 16 Jan 2026 01:45:29 GMT
- Title: Can Instructed Retrieval Models Really Support Exploration?
- Authors: Piyush Maheshwari, Sheshera Mysore, Hamed Zamani,
- Abstract summary: Best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches.<n>While users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions.
- Score: 29.8124798158787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploratory searches are characterized by under-specified goals and evolving query intents. In such scenarios, retrieval models that can capture user-specified nuances in query intent and adapt results accordingly are desirable -- instruction-following retrieval models promise such a capability. In this work, we evaluate instructed retrievers for the prevalent yet under-explored application of aspect-conditional seed-guided exploration using an expert-annotated test collection. We evaluate both recent LLMs fine-tuned for instructed retrieval and general-purpose LLMs prompted for ranking with the highly performant Pairwise Ranking Prompting. We find that the best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches. However, we also find that instruction following performance, crucial to the user experience of interacting with models, does not mirror ranking relevance improvements and displays insensitivity or counter-intuitive behavior to instructions. Our results indicate that while users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions requiring greater sensitivity to instructions.
Related papers
- Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z) - SEM: Reinforcement Learning for Search-Efficient Large Language Models [26.075903427834838]
Large Language Models (LLMs) have demonstrated their capabilities not only in reasoning but also in invoking external tools.<n>Existing reinforcement learning approaches often lead to redundant search behaviors, resulting in inefficiencies and over-cost.<n>We propose SEM, a novel post-training reinforcement learning framework that explicitly trains LLMs to optimize search usage.
arXiv Detail & Related papers (2025-05-12T09:45:40Z) - LLM-Driven Usefulness Judgment for Web Search Evaluation [12.10711284043516]
Evaluation is fundamental in optimizing search experiences and supporting diverse user intents in Information Retrieval (IR)<n>Traditional search evaluation methods primarily rely on relevance labels, which assess how well retrieved documents match a user's query.<n>In this paper, we explore an alternative approach: LLM-generated usefulness labels, which incorporate both implicit and explicit user behavior signals to evaluate document usefulness.
arXiv Detail & Related papers (2025-04-19T20:38:09Z) - SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models [88.29990536278167]
We introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs.<n>Our experiments show that a LLaMA3-8B model, trained over three iterations guided by SPaR, surpasses GPT-4-Turbo on the IFEval benchmark without losing general capabilities.
arXiv Detail & Related papers (2024-12-16T09:47:43Z) - Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models [25.301280441283147]
This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance.<n>We develop a novel retrieval evaluation benchmark spanning six document-level attributes.<n>Our findings indicate that although fine-tuning models on instruction-aware retrieval datasets enhance performance, most models still fall short of instruction compliance.
arXiv Detail & Related papers (2024-10-31T11:47:21Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions [71.5977045423177]
We study the use of instructions in Information Retrieval systems.
We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark.
We show that it is possible for IR models to learn to follow complex instructions.
arXiv Detail & Related papers (2024-03-22T14:42:29Z) - INSTRUCTIR: A Benchmark for Instruction Following of Information
Retrieval Models [32.16908034520376]
retrievers often only prioritize query information without delving into the users' intended search context.
We propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks.
We observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts.
arXiv Detail & Related papers (2024-02-22T06:59:50Z) - I3: Intent-Introspective Retrieval Conditioned on Instructions [83.91776238599824]
I3 is a unified retrieval system that performs Intent-Introspective retrieval across various tasks conditioned on Instructions without task-specific training.
I3 incorporates a pluggable introspector in a parameter-isolated manner to comprehend specific retrieval intents.
It utilizes extensive LLM-generated data to train I3 phase-by-phase, embodying two key designs: progressive structure pruning and drawback-based data refinement.
arXiv Detail & Related papers (2023-08-19T14:17:57Z) - Beyond Semantics: Learning a Behavior Augmented Relevance Model with
Self-supervised Learning [25.356999988217325]
Relevance modeling aims to locate desirable items for corresponding queries.
auxiliary query-item interactions extracted from user historical behavior data could provide hints to reveal users' search intents further.
Our model builds multi-level co-attention for distilling coarse-grained and fine-grained semantic representations from both neighbor and target views.
arXiv Detail & Related papers (2023-08-10T06:52:53Z) - On the Evaluation of Vision-and-Language Navigation Instructions [76.92085026018427]
Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
Existing instruction generators have not been comprehensively evaluated.
BLEU, ROUGE, METEOR and CIDEr are ineffective for evaluating grounded navigation instructions.
arXiv Detail & Related papers (2021-01-26T01:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.