Related papers: Regret Lower Bounds in Multi-agent Multi-armed Bandit

Regret Lower Bounds in Multi-agent Multi-armed Bandit

URL: http://arxiv.org/abs/2308.08046v1
Date: Tue, 15 Aug 2023 21:20:24 GMT
Title: Regret Lower Bounds in Multi-agent Multi-armed Bandit
Authors: Mengfan Xu, Diego Klabjan
Abstract summary: Multi-armed Bandit motivates methods with provable upper bounds on regret. We provide the first comprehensive study on regret lower bounds across different settings.
Score: 14.822625665220068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints [8.784438985280094]
We study problems as pure exploration in multi-armed bandits with unknown linear constraints. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. Second, we leverage the Lagrangian lower bound and the properties of convex to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX.
arXiv Detail & Related papers (2024-10-24T15:26:14Z)
Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI. Agent's goal is to identify the best arm with a prescribed confidence level. Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z)
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity [51.96093077151991]
We derive a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate.
arXiv Detail & Related papers (2023-10-27T09:16:58Z)
Lower Bounds for Learning in Revealing POMDPs [88.23337313766355]
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging emphpartially observable setting. For emphmulti-step revealing POMDPs, we show that the latent state-space dependence is at least $Omega(S1.5)$ in the sample complexity.
arXiv Detail & Related papers (2023-02-02T18:59:30Z)
Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm [14.834625066344582]
We study the sequential general online regression, known also as the sequential probability assignments, under logarithmic loss. We focus on obtaining tight, often matching, lower and upper bounds for the sequential minimax regret that are defined as the excess loss it incurs over a class of experts.
arXiv Detail & Related papers (2022-05-07T22:03:00Z)
Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective [49.17352150219212]
Federated AveragingFedAvg, also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL) We show how to analyze this quantity from the Differential Equation (SDE) perspective.
arXiv Detail & Related papers (2021-11-05T22:16:11Z)
Unified lower bounds for interactive high-dimensional estimation under information constraints [40.339506154827106]
We provide a unified framework enabling us to derive a variety of (tight) minimax lower bounds for different parametric families of distributions. Our lower bound framework is versatile and yields "plug-and-play" bounds that are widely applicable to a large range of estimation problems.
arXiv Detail & Related papers (2020-10-13T17:25:19Z)
On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm. We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round. We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.