Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards
- URL: http://arxiv.org/abs/2306.05579v2
- Date: Wed, 18 Oct 2023 00:00:04 GMT
- Title: Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards
- Authors: Mengfan Xu and Diego Klabjan
- Abstract summary: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment.
The goal is to minimize the overall regret of the entire system through collaborations.
- Score: 14.822625665220068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a decentralized multi-agent multi-armed bandit problem in which
multiple clients are connected by time dependent random graphs provided by an
environment. The reward distributions of each arm vary across clients and
rewards are generated independently over time by an environment based on
distributions that include both sub-exponential and sub-gaussian distributions.
Each client pulls an arm and communicates with neighbors based on the graph
provided by the environment. The goal is to minimize the overall regret of the
entire system through collaborations. To this end, we introduce a novel
algorithmic framework, which first provides robust simulation methods for
generating random graphs using rapidly mixing Markov chains or the random graph
model, and then combines an averaging-based consensus approach with a newly
proposed weighting technique and the upper confidence bound to deliver a
UCB-type solution. Our algorithms account for the randomness in the graphs,
removing the conventional doubly stochasticity assumption, and only require the
knowledge of the number of clients at initialization. We derive optimal
instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian
and sub-exponential environments, and a nearly optimal mean-gap independent
regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor.
Importantly, our regret bounds hold with high probability and capture graph
randomness, whereas prior works consider expected regret under assumptions and
require more stringent reward distributions.
Related papers
- Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.
This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.
Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z) - Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics [5.54892060291103]
We study decentralized multi-agent multi-armed bandits in fully heavy-tailed settings, where clients communicate over sparse random graphs with heavy-tailed degree distributions.
We are the first to address such fully heavy-tailed scenarios, which capture the dynamics and challenges in communication and inference among multiple clients in real-world systems.
arXiv Detail & Related papers (2025-01-31T15:53:14Z) - Order-Optimal Regret in Distributed Kernel Bandits using Uniform
Sampling with Shared Randomness [9.731329071569018]
We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function.
We develop the first algorithm that achieves the optimal regret order with a communication cost that is sublinear in both $N$ and $T$.
arXiv Detail & Related papers (2024-02-20T17:49:10Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Beyond the Best: Estimating Distribution Functionals in Infinite-Armed
Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution.
We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Rate-Distortion Theoretic Bounds on Generalization Error for Distributed
Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms.
The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.