Related papers: Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking

URL: http://arxiv.org/abs/2205.09638v1
Date: Thu, 19 May 2022 16:00:13 GMT
Title: Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking
Authors: Minghan Li, Xinyu Zhang, Ji Xin, Hongyang Zhang, Jimmy Lin
Abstract summary: We propose the concept of certified error control of candidate set pruning for relevance ranking. Our method successfully prunes the first-stage retrieved candidate sets to improve the second-stage reranking speed.
Score: 57.42241521034744
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In information retrieval (IR), candidate set pruning has been commonly used to speed up two-stage relevance ranking. However, such an approach lacks accurate error control and often trades accuracy off against computational efficiency in an empirical fashion, lacking theoretical guarantees. In this paper, we propose the concept of certified error control of candidate set pruning for relevance ranking, which means that the test error after pruning is guaranteed to be controlled under a user-specified threshold with high probability. Both in-domain and out-of-domain experiments show that our method successfully prunes the first-stage retrieved candidate sets to improve the second-stage reranking speed while satisfying the pre-specified accuracy constraints in both settings. For example, on MS MARCO Passage v1, our method yields an average candidate set size of 27 out of 1,000 which increases the reranking speed by about 37 times, while the MRR@10 is greater than a pre-specified value of 0.38 with about 90% empirical coverage and the empirical baselines fail to provide such guarantee. Code and data are available at: https://github.com/alexlimh/CEC-Ranking.

Related papers

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Robust Conformal Prediction with a Single Binary Certificate [58.450154976190795]
Conformal prediction (CP) converts any model's output to prediction sets with a guarantee to cover the true label with (adjustable) high probability. We propose a robust conformal prediction that produces smaller sets even with significantly lower MC samples.
arXiv Detail & Related papers (2025-03-07T08:41:53Z)
Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training. We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training. Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Optimizing Metamorphic Testing: Prioritizing Relations Through Execution Profile Dissimilarity [2.6749261270690434]
An oracle determines whether the output of a program for executed test cases is correct. For machine learning programs, such an oracle is often unavailable or impractical to apply. Prioritizing MRs enhances fault detection effectiveness and improves testing efficiency.
arXiv Detail & Related papers (2024-11-14T04:14:30Z)
A Self-boosted Framework for Calibrated Ranking [7.4291851609176645]
Calibrated Ranking is a scale-calibrated ranking system that pursues accurate ranking quality and calibrated probabilistic predictions simultaneously. Previous methods need to aggregate the full candidate list within a single mini-batch to compute the ranking loss. We propose a Self-Boosted framework for Calibrated Ranking (SBCR)
arXiv Detail & Related papers (2024-06-12T09:00:49Z)
Early Time Classification with Accumulated Accuracy Gap Control [34.77841988415891]
Early time classification algorithms aim to label a stream of features without processing the full input stream. We introduce a statistical framework that can be applied to any sequential classifier, formulating a calibrated stopping rule. We show that our proposed early stopping mechanism reduces up to 94% of timesteps used for classification while achieving rigorous accuracy gap control.
arXiv Detail & Related papers (2024-02-01T18:54:34Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy [8.010528849585937]
We derive an (almost) guaranteed upper bound on the error of deep neural networks under distribution shift using unlabeled test data. In particular, our bound requires a simple, intuitive condition which is well justified by prior empirical works. We expect this loss can serve as a drop-in replacement for future methods which require maximizing multiclass disagreement.
arXiv Detail & Related papers (2023-06-01T03:22:15Z)
Accurate and Reliable Methods for 5G UAV Jamming Identification With Calibrated Uncertainty [3.4208659698673127]
Only increasing accuracy without considering uncertainty may negatively impact Deep Neural Network (DNN) decision-making. This paper proposes five combined preprocessing and post-processing methods for time-series binary classification problems.
arXiv Detail & Related papers (2022-11-05T15:04:45Z)
Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling. We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z)
Input-Specific Robustness Certification for Randomized Smoothing [76.76115360719837]
We propose Input-Specific Sampling (ISS) acceleration to achieve the cost-effectiveness for robustness certification. ISS can speed up the certification by more than three times at a limited cost of 0.05 certified radius.
arXiv Detail & Related papers (2021-12-21T12:16:03Z)
Distribution-free uncertainty quantification for classification under label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues. We first argue that label shift hurts UQ, by showing degradation in coverage and calibration. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z)
Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints. We also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets.
arXiv Detail & Related papers (2020-08-21T18:43:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.