One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees
- URL: http://arxiv.org/abs/2505.10160v1
- Date: Thu, 15 May 2025 10:41:16 GMT
- Title: One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees
- Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi,
- Abstract summary: We introduce the first one-stage Top-$k$ Learning-to-Defer framework.<n>We learn a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input.<n>Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral.
- Score: 3.6787328174619254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a single end-to-end objective. We define a cost-sensitive loss and derive a novel convex surrogate that is independent of the cardinality parameter $k$, enabling generalization across Top-$k$ regimes without retraining. Our formulation recovers the Top-1 deferral policy of prior score-based methods as a special case, and we prove that our surrogate is both Bayes-consistent and $\mathcal{H}$-consistent under mild assumptions. We further introduce an adaptive variant, Top-$k(x)$, which dynamically selects the number of consulted entities per input to balance predictive accuracy and consultation cost. Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral, while Top-$k(x)$ achieves superior accuracy-cost trade-offs by tailoring allocations to input complexity.
Related papers
- Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification [83.25968588249776]
We present a novel framework that combines over-approximation techniques from LiRPA-based approaches with a sampling-based method to compute tight intermediate reachable sets.<n>With negligible computational overhead, $textttPT-LiRPA$ exploiting the estimated reachable sets, significantly tightens the lower and upper linear bounds of a neural network's output.
arXiv Detail & Related papers (2025-07-07T18:45:53Z) - Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer [30.389055604165222]
This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees.<n>We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage and two-stage learning scenarios.<n>We derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario.
arXiv Detail & Related papers (2025-06-25T17:48:58Z) - Sample Complexity and Representation Ability of Test-time Scaling Paradigms [91.34339030453425]
Test-time scaling paradigms have advanced the capabilities of large language models (LLMs) on complex tasks.<n>We study the sample efficiency of various test-time strategies, such as self-consistency, best-of-$n$, and self-correction.<n>A single Transformer architecture can provably solve multiple tasks without prior knowledge of the specific task associated with a user query.
arXiv Detail & Related papers (2025-06-05T17:48:19Z) - Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts [3.6787328174619254]
We introduce the first framework for Top-$k$ Learning-to-Defer, enabling systems to defer each query to the $k$ most cost-effective experts.<n>We propose Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of experts per query based on input complexity, expert quality, and consultation cost.
arXiv Detail & Related papers (2025-04-17T14:50:40Z) - Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment [54.787826863212146]
Inference-time computation offers a powerful axis for scaling the performance of language models.<n>We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute.<n>We introduce $textttInferenceTimePessimism$, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute.
arXiv Detail & Related papers (2025-03-27T18:00:08Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.<n>We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model.
Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z) - $i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization [12.266207199002604]
Large Language Models (LLM) can sometimes produce outputs that deviate from human expectations.
We propose a novel framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization.
We show that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators.
arXiv Detail & Related papers (2024-05-24T05:42:11Z) - Top-$k$ Classification and Cardinality-Aware Prediction [30.389055604165222]
We show that comp-sum and constrained losses are supported by $H$-consistency bounds with respect to the top-$k$ loss.
We introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning.
Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification.
arXiv Detail & Related papers (2024-03-28T17:45:03Z) - Budgeted Classification with Rejection: An Evolutionary Method with
Multiple Objectives [0.0]
Budgeted, sequential classifiers (BSCs) process inputs through a sequence of partial feature acquisition and evaluation steps.
This allows for an efficient evaluation of inputs that prevents unneeded feature acquisition.
We propose a problem-specific genetic algorithm to build budgeted, sequential classifiers with confidence-based reject options.
arXiv Detail & Related papers (2022-05-01T22:05:16Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.