Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts
- URL: http://arxiv.org/abs/2504.12988v2
- Date: Tue, 22 Apr 2025 07:02:20 GMT
- Title: Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts
- Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi,
- Abstract summary: Learning-to-Defer (L2D) enables decision-making systems to improve reliability by selectively deferring uncertain predictions to more competent agents.<n>We propose Top-$k$ Learning-to-Defer, a generalization of the classical two-stage L2D framework that allocates each query to the $k$ most confident agents instead of a single one.<n>To further enhance flexibility and cost-efficiency, we introduce Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of agents to consult for each query.
- Score: 3.6787328174619254
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning-to-Defer (L2D) enables decision-making systems to improve reliability by selectively deferring uncertain predictions to more competent agents. However, most existing approaches focus exclusively on single-agent deferral, which is often inadequate in high-stakes scenarios that require collective expertise. We propose Top-$k$ Learning-to-Defer, a generalization of the classical two-stage L2D framework that allocates each query to the $k$ most confident agents instead of a single one. To further enhance flexibility and cost-efficiency, we introduce Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of agents to consult for each query, based on input complexity, agent competency distributions, and consultation costs. For both settings, we derive a novel surrogate loss and prove that it is Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent, ensuring convergence to the Bayes-optimal allocation. Notably, we show that the well-established model cascades paradigm arises as a restricted instance of our Top-$k$ and Top-$k(x)$ formulations. Extensive experiments across diverse benchmarks demonstrate the effectiveness of our framework on both classification and regression tasks.
Related papers
- Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning [76.10639521319382]
We propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework.
We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead.
arXiv Detail & Related papers (2025-03-07T18:03:13Z) - Pareto Optimal Algorithmic Recourse in Multi-cost Function [0.44938884406455726]
algorithmic recourse aims to identify minimal-cost actions to alter an individual features, thereby obtaining a desired outcome.
Most current recourse mechanisms use gradient-based methods that assume cost functions are differentiable, often not applicable in real-world scenarios.
This work proposes an algorithmic recourse framework that handles nondifferentiable and discrete multi-cost functions.
arXiv Detail & Related papers (2025-02-11T03:16:08Z) - Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees [3.4289478404209826]
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection.<n>We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions.
arXiv Detail & Related papers (2024-10-21T08:21:00Z) - A Two-Stage Learning-to-Defer Approach for Multi-Task Learning [3.4289478404209826]
We introduce a novel Two-Stage Learning-to-Defer framework for multi-task learning that jointly addresses classification and regression tasks.<n>We validate our framework on two challenging tasks: object detection, where classification and regression are tightly coupled, and electronic health record analysis.
arXiv Detail & Related papers (2024-10-21T07:44:57Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability.
A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance.
We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.