Related papers: Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts

Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts

URL: http://arxiv.org/abs/2504.12988v3
Date: Thu, 15 May 2025 10:25:18 GMT
Title: Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts
Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi,
Abstract summary: We introduce the first framework for Top-$k$ Learning-to-Defer, enabling systems to defer each query to the $k$ most cost-effective experts.<n>We propose Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of experts per query based on input complexity, expert quality, and consultation cost.
Score: 3.6787328174619254
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Although existing Learning-to-Defer (L2D) frameworks support multiple experts, they allocate each query to a single expert, limiting their ability to leverage collective expertise in complex decision-making scenarios. To address this, we introduce the first framework for Top-$k$ Learning-to-Defer, enabling systems to defer each query to the $k$ most cost-effective experts. Our formulation strictly generalizes classical two-stage L2D by supporting multi-expert deferral-a capability absent in prior work. We further propose Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of experts per query based on input complexity, expert quality, and consultation cost. We introduce a novel surrogate loss that is Bayes-consistent, $(\mathcal{R}, \mathcal{G})$-consistent, and independent of the cardinality parameter $k$, enabling efficient reuse across different values of $k$. We show that classical model cascades arise as a special case of our method, situating our framework as a strict generalization of both selective deferral and cascaded inference. Experiments on classification and regression demonstrate that Top-$k$ and Top-$k(x)$ yield improved accuracy--cost trade-offs, establishing a new direction for multi-expert deferral in Learning-to-Defer.

Related papers

Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer [30.389055604165222]
This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees.<n>We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage and two-stage learning scenarios.<n>We derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario.
arXiv Detail & Related papers (2025-06-25T17:48:58Z)
One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees [3.6787328174619254]
We introduce the first one-stage Top-$k$ Learning-to-Defer framework.<n>We learn a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input.<n>Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral.
arXiv Detail & Related papers (2025-05-15T10:41:16Z)
Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization [51.562474873972086]
Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data.<n>Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt.<n>We propose TRIP, a Token-level prompt mixture with parameter-free routing framework for FedDG.
arXiv Detail & Related papers (2025-04-29T11:06:03Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning [76.10639521319382]
We propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead.
arXiv Detail & Related papers (2025-03-07T18:03:13Z)
ExpertGenQA: Open-ended QA generation in Specialized Domains [9.412082058055823]
ExpertGenQA is a protocol that combines few-shot learning with structured topic and style categorization to generate comprehensive domain-specific QA pairs.<n>We show that ExpertGenQA achieves twice the efficiency of baseline few-shot approaches while maintaining $94.4%$ topic coverage.
arXiv Detail & Related papers (2025-03-04T19:09:48Z)
Pareto Optimal Algorithmic Recourse in Multi-cost Function [0.44938884406455726]
algorithmic recourse aims to identify minimal-cost actions to alter an individual features, thereby obtaining a desired outcome. Most current recourse mechanisms use gradient-based methods that assume cost functions are differentiable, often not applicable in real-world scenarios. This work proposes an algorithmic recourse framework that handles nondifferentiable and discrete multi-cost functions.
arXiv Detail & Related papers (2025-02-11T03:16:08Z)
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees [3.4289478404209826]
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection.<n>We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions.
arXiv Detail & Related papers (2024-10-21T08:21:00Z)
A Two-Stage Learning-to-Defer Approach for Multi-Task Learning [3.4289478404209826]
We introduce a novel Two-Stage Learning-to-Defer framework for multi-task learning that jointly addresses classification and regression tasks.<n>We validate our framework on two challenging tasks: object detection, where classification and regression are tightly coupled, and electronic health record analysis.
arXiv Detail & Related papers (2024-10-21T07:44:57Z)
Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts. Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards. We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z)
Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z)
No-Regret Online Prediction with Strategic Experts [16.54912614895861]
We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick $mgeq 1$ experts from a pool of $K$ experts. We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm's predictions by potentially misreporting their beliefs. Our goal is to design algorithms that satisfy the following two requirements: 1) $textitIncentive-compatible$: Incentivize the experts to report their beliefs truthfully, and 2) $textitNo-regret$: Achieve
arXiv Detail & Related papers (2023-05-24T16:43:21Z)
Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions. We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers. We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z)
Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints. Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z)
$k\ exttt{-experts}$ -- Online Policies and Fundamental Limits [8.84337023214151]
This paper studies the $textttExperts$ problem. The learner selects a subset of $k$ experts from a pool of $N$ experts at each round. The reward obtained by the learner at any round depends on the rewards of the selected experts.
arXiv Detail & Related papers (2021-10-15T06:30:15Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.