Related papers: Surprise: Result List Truncation via Extreme Value Theory

Surprise: Result List Truncation via Extreme Value Theory

URL: http://arxiv.org/abs/2010.09797v1
Date: Mon, 19 Oct 2020 19:15:50 GMT
Title: Surprise: Result List Truncation via Extreme Value Theory
Authors: Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins
Abstract summary: We propose a statistical method that produces interpretable and calibrated relevance scores at query time using nothing more than the ranked scores. We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets.
Score: 92.5817701697342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user. The problem of result list truncation, or where to truncate the ranked list of results, however, has received less attention despite being crucial in a variety of applications. Such truncation is a balancing act between the overall relevance, or usefulness of the results, with the user cost of processing more results. Result list truncation can be challenging because relevance scores are often not well-calibrated. This is particularly true in large-scale IR systems where documents and queries are embedded in the same metric space and a query's nearest document neighbors are returned during inference. Here, relevance is inversely proportional to the distance between the query and candidate document, but what distance constitutes relevance varies from query to query and changes dynamically as more documents are added to the index. In this work, we propose Surprise scoring, a statistical method that leverages the Generalized Pareto distribution that arises in extreme value theory to produce interpretable and calibrated relevance scores at query time using nothing more than the ranked scores. We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets and compare it to both classical and recent baselines. We draw connections to hypothesis testing and $p$-values.

Related papers

Attribution Score Alignment in Explainable Data Management [0.0]
We investigate the alignment of different scores on the basis of the queries at hand. It turns out that the presence of Causal Responsibility makes a crucial difference in this regard.
arXiv Detail & Related papers (2025-03-18T17:45:32Z)
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG) We show that retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. We show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs.
arXiv Detail & Related papers (2025-03-06T23:23:13Z)
pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items. In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z)
Relevance Filtering for Embedding-based Retrieval [46.851594313019895]
In embedding-based retrieval, Approximate Nearest Neighbor (ANN) search enables efficient retrieval of similar items from large-scale datasets. This paper introduces a novel relevance filtering component (called "Cosine Adapter") for embedding-based retrieval to address this challenge. We are able to significantly increase the precision of the retrieved set, at the expense of a small loss of recall.
arXiv Detail & Related papers (2024-08-09T06:21:20Z)
Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems. Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers. It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z)
List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently. GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture. Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z)
Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are no standard procedure for using this ranking information and Area Chairs may use it in different ways. We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z)
Online Learning of Optimally Diverse Rankings [63.62764375279861]
We propose an algorithm that efficiently learns the optimal list based on users' feedback only. We show that after $T$ queries, the regret of LDR scales as $O((N-L)log(T))$ where $N$ is the number of all items.
arXiv Detail & Related papers (2021-09-13T12:13:20Z)
Leveraging semantically similar queries for ranking via combining representations [20.79800117378761]
In data-scarce settings, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.
arXiv Detail & Related papers (2021-06-23T18:36:20Z)
Choppy: Cut Transformer For Ranked List Truncation [92.58177016973421]
Choppy is an assumption-free model based on the widely successful Transformer architecture. We show Choppy improves upon recent state-of-the-art methods.
arXiv Detail & Related papers (2020-04-26T00:52:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.