Related papers: One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees

One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees

URL: http://arxiv.org/abs/2505.10160v1
Date: Thu, 15 May 2025 10:41:16 GMT
Title: One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees
Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi,
Abstract summary: We introduce the first one-stage Top-$k$ Learning-to-Defer framework.<n>We learn a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input.<n>Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral.
Score: 3.6787328174619254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a single end-to-end objective. We define a cost-sensitive loss and derive a novel convex surrogate that is independent of the cardinality parameter $k$, enabling generalization across Top-$k$ regimes without retraining. Our formulation recovers the Top-1 deferral policy of prior score-based methods as a special case, and we prove that our surrogate is both Bayes-consistent and $\mathcal{H}$-consistent under mild assumptions. We further introduce an adaptive variant, Top-$k(x)$, which dynamically selects the number of consulted entities per input to balance predictive accuracy and consultation cost. Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral, while Top-$k(x)$ achieves superior accuracy-cost trade-offs by tailoring allocations to input complexity.

Related papers

Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification [83.25968588249776]
We present a novel framework that combines over-approximation techniques from LiRPA-based approaches with a sampling-based method to compute tight intermediate reachable sets.<n>With negligible computational overhead, $textttPT-LiRPA$ exploiting the estimated reachable sets, significantly tightens the lower and upper linear bounds of a neural network's output.
arXiv Detail & Related papers (2025-07-07T18:45:53Z)
Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer [30.389055604165222]
This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees.<n>We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage and two-stage learning scenarios.<n>We derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario.
arXiv Detail & Related papers (2025-06-25T17:48:58Z)
Sample Complexity and Representation Ability of Test-time Scaling Paradigms [91.34339030453425]
Test-time scaling paradigms have advanced the capabilities of large language models (LLMs) on complex tasks.<n>We study the sample efficiency of various test-time strategies, such as self-consistency, best-of-$n$, and self-correction.<n>A single Transformer architecture can provably solve multiple tasks without prior knowledge of the specific task associated with a user query.
arXiv Detail & Related papers (2025-06-05T17:48:19Z)
Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts [3.6787328174619254]
We introduce the first framework for Top-$k$ Learning-to-Defer, enabling systems to defer each query to the $k$ most cost-effective experts.<n>We propose Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of experts per query based on input complexity, expert quality, and consultation cost.
arXiv Detail & Related papers (2025-04-17T14:50:40Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment [54.787826863212146]
Inference-time computation offers a powerful axis for scaling the performance of language models.<n>We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute.<n>We introduce $textttInferenceTimePessimism$, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute.
arXiv Detail & Related papers (2025-03-27T18:00:08Z)
Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.<n>We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z)
Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z)
$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization [12.266207199002604]
Large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. We propose a novel framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. We show that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators.
arXiv Detail & Related papers (2024-05-24T05:42:11Z)
Top-$k$ Classification and Cardinality-Aware Prediction [30.389055604165222]
We show that comp-sum and constrained losses are supported by $H$-consistency bounds with respect to the top-$k$ loss. We introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning. Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification.
arXiv Detail & Related papers (2024-03-28T17:45:03Z)
Budgeted Classification with Rejection: An Evolutionary Method with Multiple Objectives [0.0]
Budgeted, sequential classifiers (BSCs) process inputs through a sequence of partial feature acquisition and evaluation steps. This allows for an efficient evaluation of inputs that prevents unneeded feature acquisition. We propose a problem-specific genetic algorithm to build budgeted, sequential classifiers with confidence-based reject options.
arXiv Detail & Related papers (2022-05-01T22:05:16Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.