Related papers: Trading Off Resource Budgets for Improved Regret Bounds

Trading Off Resource Budgets for Improved Regret Bounds

URL: http://arxiv.org/abs/2210.05789v1
Date: Tue, 11 Oct 2022 21:13:48 GMT
Title: Trading Off Resource Budgets for Improved Regret Bounds
Authors: Damon Falck and Thomas Orton
Abstract summary: We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for adversarial online learning problems. We show that FPML achieves expected regret $mathcalO(Tfrac1B+1ln(N)fracBB+1)$ over time horizon. We show how FPML can lead to new algorithms for Linear Programming which require stronger oracles.
Score: 6.85316573653194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work we consider a variant of adversarial online learning where in each round one picks $B$ out of $N$ arms and incurs cost equal to the $\textit{minimum}$ of the costs of each arm chosen. We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for this problem, which we show (by adapting the techniques of Kalai and Vempala [2005]) achieves expected regret $\mathcal{O}(T^{\frac{1}{B+1}}\ln(N)^{\frac{B}{B+1}})$ over time horizon $T$ relative to the $\textit{single}$ best arm in hindsight. This introduces a trade-off between the budget $B$ and the single-best-arm regret, and we proceed to investigate several applications of this trade-off. First, we observe that algorithms which use standard regret minimizers as subroutines can sometimes be adapted by replacing these subroutines with FPML, and we use this to generalize existing algorithms for Online Submodular Function Maximization [Streeter and Golovin, 2008] in both the full feedback and semi-bandit feedback settings. Next, we empirically evaluate our new algorithms on an online black-box hyperparameter optimization problem. Finally, we show how FPML can lead to new algorithms for Linear Programming which require stronger oracles at the benefit of fewer oracle calls.

Related papers

UCB-type Algorithm for Budget-Constrained Expert Learning [71.67657715154034]
algnameM-LCB is a UCB-style meta-algorithm that provides emphanytime regret guarantees<n>We show how algnameM-LCB extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
arXiv Detail & Related papers (2025-10-26T12:36:17Z)
Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems [18.74680577173648]
We consider a common case of the $m$-set semi-bandit problem, where the learner exactly selects $m$ arms from the total $d$ arms.<n>We show that FTPL with a Fr'echet perturbation also enjoys the near optimal regret bound $mathcalO(sqrtnm(sqrtdlog(d)+m5/6)$ in the adversarial setting.
arXiv Detail & Related papers (2025-04-09T22:07:01Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Improved Projection-free Online Continuous Submodular Maximization [35.324719857218014]
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions. Previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations. We propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T3/4)$ while keeping the same computational complexity.
arXiv Detail & Related papers (2023-05-29T02:54:31Z)
Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs [31.673857053336352]
We propose an algorithm that achieves the near-optimal regret with a switching cost logarithmic in the number of episodes and linear in the time-horizon $H$. We also show the doubling trick'' used in ELEANOR-LowSwitching can be further leveraged for the generalized linear function approximation.
arXiv Detail & Related papers (2023-02-24T05:14:27Z)
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback [27.192028744078282]
We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods. The proposed framework is applied to diverse applications in submodular horizon.
arXiv Detail & Related papers (2023-01-30T23:18:06Z)
Towards Painless Policy Optimization for Constrained MDPs [46.12526917024248]
We study policy optimization in an infinite horizon, $gamma$-discounted constrained Markov decision process (CMDP) Our objective is to return a policy that achieves large expected reward with a small constraint violation. We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms.
arXiv Detail & Related papers (2022-04-11T15:08:09Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
New Projection-free Algorithms for Online Convex Optimization with Adaptive Regret Guarantees [21.30065439295409]
We present new efficient textitprojection-free algorithms for online convex optimization (OCO) Our algorithms are based on the textitonline gradient descent algorithm with a novel and efficient approach to computing so-called textitinfeasible projections We present algorithms which, using overall $O(T)$ calls to the separation oracle, guarantee $O(sqrtT)$ adaptive regret and $O(T3/4)$ adaptive expected regret.
arXiv Detail & Related papers (2022-02-09T20:56:16Z)
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other. We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost [53.968049198926444]
We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an $widetildeOleft(sqrtd3H4Kright)$ regret bound with a near-optimal $Oleft(d Hlog Kright)$ global switching cost.
arXiv Detail & Related papers (2021-01-02T18:41:27Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.