Fair Algorithms for Multi-Agent Multi-Armed Bandits
- URL: http://arxiv.org/abs/2007.06699v2
- Date: Wed, 24 Feb 2021 02:43:57 GMT
- Title: Fair Algorithms for Multi-Agent Multi-Armed Bandits
- Authors: Safwan Hossain, Evi Micha, Nisarg Shah
- Abstract summary: We propose a multi-agent variant of the classical multi-armed bandit problem.
The goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally.
We show that multi-agent variants of three classic multi-armed bandit algorithms achieve sublinear regret.
- Score: 29.68201160277817
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a multi-agent variant of the classical multi-armed bandit problem,
in which there are $N$ agents and $K$ arms, and pulling an arm generates a
(possibly different) stochastic reward for each agent. Unlike the classical
multi-armed bandit problem, the goal is not to learn the "best arm"; indeed,
each agent may perceive a different arm to be the best for her personally.
Instead, we seek to learn a fair distribution over the arms. Drawing on a long
line of research in economics and computer science, we use the Nash social
welfare as our notion of fairness. We design multi-agent variants of three
classic multi-armed bandit algorithms and show that they achieve sublinear
regret, which is now measured in terms of the lost Nash social welfare.
Related papers
- Best Arm Identification in Batched Multi-armed Bandit Problems [3.908867017036043]
Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches.
We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification.
We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.
arXiv Detail & Related papers (2023-12-21T14:16:38Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Doubly Adversarial Federated Bandits [7.23389716633927]
We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network.
Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al.
arXiv Detail & Related papers (2023-01-22T22:36:43Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Modelling Cournot Games as Multi-agent Multi-armed Bandits [4.751331778201811]
We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting for modeling repeated Cournot oligopoly games.
We find that an $epsilon$-greedy approach offers a more viable learning mechanism than other traditional MAB approaches.
We propose two novel approaches that take advantage of the ordered action space: $epsilon$-greedy+HL and $epsilon$-greedy+EL.
arXiv Detail & Related papers (2022-01-01T22:02:47Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Multi-Armed Bandits with Dependent Arms [18.81667618369821]
We study a variant of the classical multi-armed bandit problem (MABP) which we call as Multi-Armed Bandits with dependent arms.
Multiple arms are grouped together to form a cluster, and the reward distributions of arms belonging to the same cluster are known functions of an unknown parameter that is a characteristic of the cluster.
We develop learning algorithms based on the UCB principle which utilize these additional side observations appropriately while performing exploration-exploitation trade-off.
arXiv Detail & Related papers (2020-10-13T14:00:19Z) - Generic Outlier Detection in Multi-Armed Bandit [44.11480686973274]
We propose a novel pulling algorithm named GOLD to identify such generic outlier arms.
In the experiments conducted on both synthetic and real-world data sets, the proposed algorithm achieves 98 % accuracy.
arXiv Detail & Related papers (2020-07-14T18:42:44Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.