Surprise: Result List Truncation via Extreme Value Theory
- URL: http://arxiv.org/abs/2010.09797v1
- Date: Mon, 19 Oct 2020 19:15:50 GMT
- Title: Surprise: Result List Truncation via Extreme Value Theory
- Authors: Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins
- Abstract summary: We propose a statistical method that produces interpretable and calibrated relevance scores at query time using nothing more than the ranked scores.
We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets.
- Score: 92.5817701697342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Work in information retrieval has largely been centered around ranking and
relevance: given a query, return some number of results ordered by relevance to
the user. The problem of result list truncation, or where to truncate the
ranked list of results, however, has received less attention despite being
crucial in a variety of applications. Such truncation is a balancing act
between the overall relevance, or usefulness of the results, with the user cost
of processing more results. Result list truncation can be challenging because
relevance scores are often not well-calibrated. This is particularly true in
large-scale IR systems where documents and queries are embedded in the same
metric space and a query's nearest document neighbors are returned during
inference. Here, relevance is inversely proportional to the distance between
the query and candidate document, but what distance constitutes relevance
varies from query to query and changes dynamically as more documents are added
to the index. In this work, we propose Surprise scoring, a statistical method
that leverages the Generalized Pareto distribution that arises in extreme value
theory to produce interpretable and calibrated relevance scores at query time
using nothing more than the ranked scores. We demonstrate its effectiveness on
the result list truncation task across image, text, and IR datasets and compare
it to both classical and recent baselines. We draw connections to hypothesis
testing and $p$-values.
Related papers
- Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems.
Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers.
It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z) - The Surprising Effectiveness of Rankers Trained on Expanded Queries [4.874071145951159]
We improve the ranking performance of hard or difficult queries without compromising the performance of other queries.
We combine relevance scores from the specialized ranker and the base ranker, along with a query performance score estimated for each query.
In our experiments on the DL-Hard dataset, we find that a principled query performance based scoring method offers a significant improvement of up to 25% on the passage ranking task.
arXiv Detail & Related papers (2024-04-03T09:12:22Z) - List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - Semantic Equivalence of e-Commerce Queries [6.232692545488813]
This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes.
The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives.
arXiv Detail & Related papers (2023-08-07T18:40:13Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z) - Online Learning of Optimally Diverse Rankings [63.62764375279861]
We propose an algorithm that efficiently learns the optimal list based on users' feedback only.
We show that after $T$ queries, the regret of LDR scales as $O((N-L)log(T))$ where $N$ is the number of all items.
arXiv Detail & Related papers (2021-09-13T12:13:20Z) - Leveraging semantically similar queries for ranking via combining
representations [20.79800117378761]
In data-scarce settings, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function.
One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries.
We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.
arXiv Detail & Related papers (2021-06-23T18:36:20Z) - Choppy: Cut Transformer For Ranked List Truncation [92.58177016973421]
Choppy is an assumption-free model based on the widely successful Transformer architecture.
We show Choppy improves upon recent state-of-the-art methods.
arXiv Detail & Related papers (2020-04-26T00:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.