Related papers: Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

URL: http://arxiv.org/abs/2306.05579v2
Date: Wed, 18 Oct 2023 00:00:04 GMT
Title: Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards
Authors: Mengfan Xu and Diego Klabjan
Abstract summary: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The goal is to minimize the overall regret of the entire system through collaborations.
Score: 14.822625665220068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics [5.54892060291103]
We study decentralized multi-agent multi-armed bandits in fully heavy-tailed settings, where clients communicate over sparse random graphs with heavy-tailed degree distributions. We are the first to address such fully heavy-tailed scenarios, which capture the dynamics and challenges in communication and inference among multiple clients in real-world systems.
arXiv Detail & Related papers (2025-01-31T15:53:14Z)
DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server. To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z)
Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness [9.731329071569018]
We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function. We develop the first algorithm that achieves the optimal regret order with a communication cost that is sublinear in both $N$ and $T$.
arXiv Detail & Related papers (2024-02-20T17:49:10Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
Complexity Analysis of a Countable-armed Bandit Problem [9.163501953373068]
We study the classical problem of minimizing the expected cumulative regret over a horizon of play $n$. We propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of $mathcalOleft( log n right)$ when $K=2$. While the order of regret and complexity of the problem suggests a great degree of similarity to the classical MAB problem, properties of the performance bounds and salient aspects of algorithm design are quite distinct from the latter.
arXiv Detail & Related papers (2023-01-18T00:53:46Z)
Beyond the Best: Estimating Distribution Functionals in Infinite-Armed Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution. We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning [9.00236182523638]
In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed.
arXiv Detail & Related papers (2022-06-06T13:21:52Z)
DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state. Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed. We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.