Understanding the User: An Intent-Based Ranking Dataset
- URL: http://arxiv.org/abs/2408.17103v1
- Date: Fri, 30 Aug 2024 08:40:59 GMT
- Title: Understanding the User: An Intent-Based Ranking Dataset
- Authors: Abhijit Anand, Jurek Leonhardt, V Venktesh, Avishek Anand,
- Abstract summary: This paper proposes an approach to augmenting such datasets to annotate informative query descriptions.
Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries.
By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries.
- Score: 2.6145315573431214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.
Related papers
- QUIDS: Query Intent Generation via Dual Space Modeling [12.572815037915348]
We propose a dual-space model that uses semantic relevance and irrelevance information in the returned documents to explain the understanding of the query intent.
Our methods produce high-quality query intent descriptions, outperforming existing methods for this task, as well as state-of-the-art query-based summarization methods.
arXiv Detail & Related papers (2024-10-16T09:28:58Z) - QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval [12.543590253664492]
We present a novel, interactive system called $textitQueryBuilder$.
It allows a novice, English-speaking user to create queries with a small amount of effort.
It rapidly develops cross-lingual information retrieval queries corresponding to the user's information needs.
arXiv Detail & Related papers (2024-09-07T00:46:58Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases.
Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - INSTRUCTIR: A Benchmark for Instruction Following of Information
Retrieval Models [32.16908034520376]
retrievers often only prioritize query information without delving into the users' intended search context.
We propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks.
We observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts.
arXiv Detail & Related papers (2024-02-22T06:59:50Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers.
This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.