Related papers: FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

URL: http://arxiv.org/abs/2403.15246v3
Date: Tue, 7 May 2024 14:25:15 GMT
Title: FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Authors: Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini,
Abstract summary: We study the use of instructions in Information Retrieval systems. We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark. We show that it is possible for IR models to learn to follow complex instructions.
Score: 71.5977045423177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern Language Models (LMs) are capable of following long and complex instructions that enable a large and diverse set of user requests. While Information Retrieval (IR) models use these LMs as the backbone of their architectures, virtually none of them allow users to provide detailed instructions alongside queries, thus limiting their ability to satisfy complex information needs. In this work, we study the use of instructions in IR systems. First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions. FollowIR repurposes detailed instructions -- also known as narratives -- developed for professional assessors to evaluate retrieval systems. In particular, we build our benchmark from three collections curated for shared tasks at the Text REtrieval Conference (TREC). These collections contains hundreds to thousands of labeled documents per query, making them suitable for our exploration. Through this process, we can measure how well IR models follow instructions, through a new pairwise evaluation framework. Our results indicate that existing retrieval models fail to correctly use instructions, using them for basic keywords and struggling to understand long-form information. However, we show that it is possible for IR models to learn to follow complex instructions: our new FollowIR-7B model has significant improvements after fine-tuning on our training set.

Related papers

Can Instructed Retrieval Models Really Support Exploration? [29.8124798158787]
Best instructed retrievers improve on ranking relevance compared to instruction-agnostic approaches.<n>While users may benefit from using current instructed retrievers over instruction-agnostic models, they may not benefit from using them for long-running exploratory sessions.
arXiv Detail & Related papers (2026-01-16T01:45:29Z)
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models [17.202017214385826]
This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance. We develop a novel retrieval evaluation benchmark spanning six document-level attributes. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes.
arXiv Detail & Related papers (2024-10-31T11:47:21Z)
MAIR: A Massive Benchmark for Evaluating Instructed Retrieval [39.22381869406682]
Recent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks. We propose MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous IR benchmark that includes 126 distinct IR tasks across 6 domains. We benchmark state-of-the-art instruction-tuned text embedding models and re-ranking models.
arXiv Detail & Related papers (2024-10-14T03:26:51Z)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA) Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents. We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z)
RNR: Teaching Large Language Models to Follow Roles and Rules [153.6596303205894]
We propose model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions. This data can then be used to train models that follow complex system prompts. Our framework significantly improves role and rule following capability in large language models.
arXiv Detail & Related papers (2024-09-10T06:07:32Z)
KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions [63.307317584926146]
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. We construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain.
arXiv Detail & Related papers (2024-03-06T17:16:44Z)
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models [32.16908034520376]
retrievers often only prioritize query information without delving into the users' intended search context. We propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks. We observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts.
arXiv Detail & Related papers (2024-02-22T06:59:50Z)
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks. Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language. We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z)
Task-aware Retrieval with Instructions [91.87694020194316]
We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We present TART, a multi-task retrieval system trained on the diverse retrieval tasks with instructions. TART shows strong capabilities to adapt to a new task via instructions and advances the state of the art on two zero-shot retrieval benchmarks.
arXiv Detail & Related papers (2022-11-16T23:13:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.