Related papers: Fair Multi-Agent Bandits

Fair Multi-Agent Bandits

URL: http://arxiv.org/abs/2306.04498v2
Date: Sun, 4 Feb 2024 23:42:43 GMT
Title: Fair Multi-Agent Bandits
Authors: Amir Leshem
Abstract summary: We provide an algorithm with regret $Oleft(N3 log fracBDelta f(log T) log T right)$, where $f(t)$ is any function diverging to infinity with $t$. This significantly improves previous results which had the same upper bound on the regret of order $O(f(log T) log T )$ but an exponential dependence on the number of agents.
Score: 14.614647884175657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the problem of fair multi-agent multi-arm bandit learning when agents do not communicate with each other, except collision information, provided to agents accessing the same arm simultaneously. We provide an algorithm with regret $O\left(N^3 \log \frac{B}{\Delta} f(\log T) \log T \right)$ (assuming bounded rewards, with unknown bound), where $f(t)$ is any function diverging to infinity with $t$. This significantly improves previous results which had the same upper bound on the regret of order $O(f(\log T) \log T )$ but an exponential dependence on the number of agents. The result is attained by using a distributed auction algorithm to learn the sample-optimal matching and a novel order-statistics-based regret analysis. Simulation results present the dependence of the regret on $\log T$.

Related papers

p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds. Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z)
Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit. In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z)
Refined Sample Complexity for Markov Games with Independent Linear Function Approximation [49.5660193419984]
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL) This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing pessimistic estimation of the sub-optimality gap. We give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T-1/2) convergence rate, and avoids $textpoly(A_max)$ dependency simultaneously.
arXiv Detail & Related papers (2024-02-11T01:51:15Z)
Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis [5.02063914741425]
We formulate the multi-agent graph bandit problem as a multi-agent extension of the graph bandit problem introduced by Zhang, Johansson, and Li. We propose an Upper Confidence Bound (UCB)-based learning algorithm, Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by $O(gamma Nlog(T)[sqrtKT + DK])$.
arXiv Detail & Related papers (2024-01-18T21:36:17Z)
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs [11.024467775280193]
We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $rho$ overlapping groups. We propose an efficient variant of MATS, the $epsilon$-exploring Multi-Agent Thompson Sampling ($epsilon$-MATS) algorithm. We prove that $epsilon$-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size.
arXiv Detail & Related papers (2023-12-24T21:41:01Z)
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks [93.00280593719513]
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches. In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch. Our algorithm achieves regret bounds comparable to those in fully sequential setting with only $mathcalO( log T)$ batches.
arXiv Detail & Related papers (2023-11-22T06:06:54Z)
Robust Multi-Agent Bandits Over Undirected Graphs [26.26185074977412]
We consider a multi-agent multi-armed bandit setting in which honest agents collaborate over a network to minimize regret. We show the situation is murkier beyond the case of a complete graph. We propose a new algorithm for which the $i$-th agent has regret $O( d_textmal(i) + K/n) log(T)/Delta)$ on any connected and undirected graph.
arXiv Detail & Related papers (2022-02-28T20:21:55Z)
Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem. We propose two learning algorithms, ucbo and AAE. We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z)
Mixability made efficient: Fast online multiclass logistic regression [68.8204255655161]
We show that mixability can be a powerful tool to obtain algorithms with optimal regret. The resulting methods often suffer from high computational complexity which has reduced their practical applicability.
arXiv Detail & Related papers (2021-10-08T08:22:05Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support. We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.