Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards
- URL: http://arxiv.org/abs/2306.05579v2
- Date: Wed, 18 Oct 2023 00:00:04 GMT
- Title: Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards
- Authors: Mengfan Xu and Diego Klabjan
- Abstract summary: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment.
The goal is to minimize the overall regret of the entire system through collaborations.
- Score: 14.822625665220068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a decentralized multi-agent multi-armed bandit problem in which
multiple clients are connected by time dependent random graphs provided by an
environment. The reward distributions of each arm vary across clients and
rewards are generated independently over time by an environment based on
distributions that include both sub-exponential and sub-gaussian distributions.
Each client pulls an arm and communicates with neighbors based on the graph
provided by the environment. The goal is to minimize the overall regret of the
entire system through collaborations. To this end, we introduce a novel
algorithmic framework, which first provides robust simulation methods for
generating random graphs using rapidly mixing Markov chains or the random graph
model, and then combines an averaging-based consensus approach with a newly
proposed weighting technique and the upper confidence bound to deliver a
UCB-type solution. Our algorithms account for the randomness in the graphs,
removing the conventional doubly stochasticity assumption, and only require the
knowledge of the number of clients at initialization. We derive optimal
instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian
and sub-exponential environments, and a nearly optimal mean-gap independent
regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor.
Importantly, our regret bounds hold with high probability and capture graph
randomness, whereas prior works consider expected regret under assumptions and
require more stringent reward distributions.
Related papers
- DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Order-Optimal Regret in Distributed Kernel Bandits using Uniform
Sampling with Shared Randomness [9.731329071569018]
We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function.
We develop the first algorithm that achieves the optimal regret order with a communication cost that is sublinear in both $N$ and $T$.
arXiv Detail & Related papers (2024-02-20T17:49:10Z) - Complexity Analysis of a Countable-armed Bandit Problem [9.163501953373068]
We study the classical problem of minimizing the expected cumulative regret over a horizon of play $n$.
We propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of $mathcalOleft( log n right)$ when $K=2$.
While the order of regret and complexity of the problem suggests a great degree of similarity to the classical MAB problem, properties of the performance bounds and salient aspects of algorithm design are quite distinct from the latter.
arXiv Detail & Related papers (2023-01-18T00:53:46Z) - Beyond the Best: Estimating Distribution Functionals in Infinite-Armed
Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution.
We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z) - Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients.
We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm)
We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Rate-Distortion Theoretic Bounds on Generalization Error for Distributed
Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms.
The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.