AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking
- URL: http://arxiv.org/abs/2505.18512v1
- Date: Sat, 24 May 2025 05:15:49 GMT
- Title: AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking
- Authors: Soyoung Yoon, Gyuwan Kim, Gyu-Hwung Cho, Seung-won Hwang,
- Abstract summary: Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications.<n>We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance.<n>Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines.
- Score: 25.459771464139855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Listwise reranking with large language models (LLMs) enhances top-ranked results in retrieval-based applications. Due to the limit in context size and high inference cost of long context, reranking is typically performed over a fixed size of small subsets, with the final ranking aggregated from these partial results. This fixed computation disregards query difficulty and document distribution, leading to inefficiencies. We propose AcuRank, an adaptive reranking framework that dynamically adjusts both the amount and target of computation based on uncertainty estimates over document relevance. Using a Bayesian TrueSkill model, we iteratively refine relevance estimates until reaching sufficient confidence levels, and our explicit modeling of ranking uncertainty enables principled control over reranking behavior and avoids unnecessary updates to confident predictions. Results on the TREC-DL and BEIR benchmarks show that our method consistently achieves a superior accuracy-efficiency trade-off and scales better with compute than fixed-computation baselines. These results highlight the effectiveness and generalizability of our method across diverse retrieval tasks and LLM-based reranking models.
Related papers
- Reliable Evaluation Protocol for Low-Precision Retrieval [34.65522226937288]
We propose a more robust retrieval evaluation protocol designed to reduce score variation.<n>It consists of: (1) High-Precision Scoring (HPS), which upcasts the final scoring step to higher precision to resolve tied candidates with minimal computational cost; and (2) Tie-aware Retrieval Metrics (TRM), which report expected scores, range, and bias to quantify order uncertainty of tied candidates.
arXiv Detail & Related papers (2025-08-05T10:27:57Z) - Conformal Information Pursuit for Interactively Guiding Large Language Models [64.39770942422288]
This paper explores sequential querying strategies that aim to minimize the expected number of queries.<n>One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty.<n>We propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets.
arXiv Detail & Related papers (2025-07-04T03:55:39Z) - NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z) - Robust and Computation-Aware Gaussian Processes [18.264598332579748]
We introduce Robust Computation-aware Gaussian Process (RCaGP), a novel GP model that combines a principled treatment of approximation-induced uncertainty with robust generalized Bayesian updating.<n>Our model ensures more conservative and reliable uncertainty estimates, a property we rigorously demonstrate.<n> Empirical results confirm that solving these challenges jointly leads to superior performance across both clean and outlier-contaminated settings.
arXiv Detail & Related papers (2025-05-27T12:49:14Z) - Breaking the Lens of the Telescope: Online Relevance Estimation over Large Retrieval Sets [14.494301139974455]
We propose a novel paradigm for re-ranking called online relevance estimation.<n>Online relevance estimation continuously updates relevance estimates for a query throughout the ranking process.<n>We validate our approach on TREC benchmarks under two scenarios: hybrid retrieval and adaptive retrieval.
arXiv Detail & Related papers (2025-04-12T22:05:50Z) - Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach.
This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets.
We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Regression-aware Inference with LLMs [52.764328080398805]
We show that an inference strategy can be sub-optimal for common regression and scoring evaluation metrics.
We propose alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses.
arXiv Detail & Related papers (2024-03-07T03:24:34Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Deep Reinforcement Learning at the Edge of the Statistical Precipice [31.178451465925555]
We argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field.
We advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results.
arXiv Detail & Related papers (2021-08-30T14:23:48Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.