UCB-type Algorithm for Budget-Constrained Expert Learning
- URL: http://arxiv.org/abs/2510.22654v1
- Date: Sun, 26 Oct 2025 12:36:17 GMT
- Title: UCB-type Algorithm for Budget-Constrained Expert Learning
- Authors: Ilgam Latypov, Alexandra Suvorikova, Alexey Kroshnin, Alexander Gasnikov, Yuriy Dorn,
- Abstract summary: algnameM-LCB is a UCB-style meta-algorithm that provides emphanytime regret guarantees<n>We show how algnameM-LCB extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
- Score: 71.67657715154034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many modern applications, a system must dynamically choose between several adaptive learning algorithms that are trained online. Examples include model selection in streaming environments, switching between trading strategies in finance, and orchestrating multiple contextual bandit or reinforcement learning agents. At each round, a learner must select one predictor among $K$ adaptive experts to make a prediction, while being able to update at most $M \le K$ of them under a fixed training budget. We address this problem in the \emph{stochastic setting} and introduce \algname{M-LCB}, a computationally efficient UCB-style meta-algorithm that provides \emph{anytime regret guarantees}. Its confidence intervals are built directly from realized losses, require no additional optimization, and seamlessly reflect the convergence properties of the underlying experts. If each expert achieves internal regret $\tilde O(T^\alpha)$, then \algname{M-LCB} ensures overall regret bounded by $\tilde O\!\Bigl(\sqrt{\tfrac{KT}{M}} \;+\; (K/M)^{1-\alpha}\,T^\alpha\Bigr)$. To our knowledge, this is the first result establishing regret guarantees when multiple adaptive experts are trained simultaneously under per-round budget constraints. We illustrate the framework with two representative cases: (i) parametric models trained online with stochastic losses, and (ii) experts that are themselves multi-armed bandit algorithms. These examples highlight how \algname{M-LCB} extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
Related papers
- Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives [64.16056378603875]
We present two policy learning algorithms for multi-agent online coordination problem.<n>The first one, textttMA-SPL, can achieve the optimal $(fracce)$-approximation for the MA-OC problem.<n>The second online algorithm named textttMA-MPL can simultaneously maintain the same approximation ratio.
arXiv Detail & Related papers (2025-09-26T17:16:34Z) - Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts [6.792743621449621]
We introduce the first framework for Top-$k$ Learning-to-Defer.<n>It allocates queries to the $k$ most cost-effective entities.<n>We also propose Top-$k(x)$ Learning-to-Defer, an adaptive variant that learns the optimal number of experts per query.
arXiv Detail & Related papers (2025-04-17T14:50:40Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - No-Regret Online Prediction with Strategic Experts [16.54912614895861]
We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick $mgeq 1$ experts from a pool of $K$ experts.
We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm's predictions by potentially misreporting their beliefs.
Our goal is to design algorithms that satisfy the following two requirements: 1) $textitIncentive-compatible$: Incentivize the experts to report their beliefs truthfully, and 2) $textitNo-regret$: Achieve
arXiv Detail & Related papers (2023-05-24T16:43:21Z) - Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints.
We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z) - When are Local Queries Useful for Robust Learning? [25.832511407411637]
We study learning models where the learner is given more power through the use of local queries.
We give the first distribution-free algorithms that perform robust empirical risk minimization.
We finish by giving robust learning algorithms for halfspaces on $0,1n$ and then obtaining robustness guarantees for halfspaces in $mathbbRn$ against precision-bounded adversaries.
arXiv Detail & Related papers (2022-10-12T11:04:22Z) - Trading Off Resource Budgets for Improved Regret Bounds [6.85316573653194]
We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for adversarial online learning problems.
We show that FPML achieves expected regret $mathcalO(Tfrac1B+1ln(N)fracBB+1)$ over time horizon.
We show how FPML can lead to new algorithms for Linear Programming which require stronger oracles.
arXiv Detail & Related papers (2022-10-11T21:13:48Z) - PDE-Based Optimal Strategy for Unconstrained Online Learning [40.61498562988079]
We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE)
Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones.
This is the first parameter-free algorithm with optimal leading constant.
arXiv Detail & Related papers (2022-01-19T22:21:21Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.