Related papers: INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models

INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models

URL: http://arxiv.org/abs/2402.14334v1
Date: Thu, 22 Feb 2024 06:59:50 GMT
Title: INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models
Authors: Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo
Abstract summary: retrievers often only prioritize query information without delving into the users' intended search context. We propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks. We observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts.
Score: 32.16908034520376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the critical need to align search targets with users' intention, retrievers often only prioritize query information without delving into the users' intended search context. Enhancing the capability of retrievers to understand intentions and preferences of users, akin to language model instructions, has the potential to yield more aligned search targets. Prior studies restrict the application of instructions in information retrieval to a task description format, neglecting the broader context of diverse and evolving search scenarios. Furthermore, the prevailing benchmarks utilized for evaluation lack explicit tailoring to assess instruction-following ability, thereby hindering progress in this field. In response to these limitations, we propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks. Our approach focuses on user-aligned instructions tailored to each query instance, reflecting the diverse characteristics inherent in real-world search scenarios. Through experimental analysis, we observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts. This underscores potential overfitting issues inherent in constructing retrievers trained on existing instruction-aware retrieval datasets.

Related papers

Can Instructed Retrieval Models Really Support Exploration? [29.8124798158787]
Best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches.<n>While users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions.
arXiv Detail & Related papers (2026-01-16T01:45:29Z)
Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z)
Do Retrieval-Augmented Language Models Adapt to Varying User Needs? [28.729041459278587]
This paper introduces a novel evaluation framework that systematically assesses RALMs under three user need cases. By varying both user instructions and the nature of retrieved information, our approach captures the complexities of real-world applications. Our findings highlight the necessity of user-centric evaluations in the development of retrieval-augmented systems.
arXiv Detail & Related papers (2025-02-27T05:39:38Z)
Unsupervised Query Routing for Retrieval Augmented Generation [64.47987041500966]
We introduce a novel unsupervised method that constructs the "upper-bound" response to evaluate the quality of retrieval-augmented responses. This evaluation enables the decision of the most suitable search engine for a given query. By eliminating manual annotations, our approach can automatically process large-scale real user queries and create training data.
arXiv Detail & Related papers (2025-01-14T02:27:06Z)
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models [17.202017214385826]
This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance. We develop a novel retrieval evaluation benchmark spanning six document-level attributes. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes.
arXiv Detail & Related papers (2024-10-31T11:47:21Z)
Understanding the User: An Intent-Based Ranking Dataset [2.6145315573431214]
This paper proposes an approach to augmenting such datasets to annotate informative query descriptions. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries.
arXiv Detail & Related papers (2024-08-30T08:40:59Z)
Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness [56.42192735214931]
retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries. We show that current retrievers have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives.
arXiv Detail & Related papers (2024-05-04T17:10:00Z)
ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval. evaluation benchmark includes 3,452 high-quality exclusionary queries. training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z)
RAR-b: Reasoning as Retrieval Benchmark [7.275757292756447]
We transform reasoning tasks into retrieval tasks to evaluate reasoning abilities stored in retriever models. Recent decoder-based embedding models show great promise in narrowing the gap. We release Reasoning as Retrieval Benchmark (RAR-b), a holistic suite of tasks and settings to evaluate the reasoning abilities stored in retriever models.
arXiv Detail & Related papers (2024-04-09T14:34:48Z)
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions [71.5977045423177]
We study the use of instructions in Information Retrieval systems. We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark. We show that it is possible for IR models to learn to follow complex instructions.
arXiv Detail & Related papers (2024-03-22T14:42:29Z)
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users. We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z)
I3: Intent-Introspective Retrieval Conditioned on Instructions [83.91776238599824]
I3 is a unified retrieval system that performs Intent-Introspective retrieval across various tasks conditioned on Instructions without task-specific training. I3 incorporates a pluggable introspector in a parameter-isolated manner to comprehend specific retrieval intents. It utilizes extensive LLM-generated data to train I3 phase-by-phase, embodying two key designs: progressive structure pruning and drawback-based data refinement.
arXiv Detail & Related papers (2023-08-19T14:17:57Z)
Task-aware Retrieval with Instructions [91.87694020194316]
We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We present TART, a multi-task retrieval system trained on the diverse retrieval tasks with instructions. TART shows strong capabilities to adapt to a new task via instructions and advances the state of the art on two zero-shot retrieval benchmarks.
arXiv Detail & Related papers (2022-11-16T23:13:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.