Dominate or Delete: Decentralized Competing Bandits in Serial
Dictatorship
- URL: http://arxiv.org/abs/2006.15166v2
- Date: Fri, 12 Mar 2021 20:12:13 GMT
- Title: Dominate or Delete: Decentralized Competing Bandits in Serial
Dictatorship
- Authors: Abishek Sankararaman, Soumya Basu, Karthik Abinav Sankararaman
- Abstract summary: We study a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms)
We design the first decentralized algorithm -- UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents.
We prove both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB-D3 is order optimal.
- Score: 16.883188358641398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online learning in a two-sided matching market, with demand side agents
continuously competing to be matched with supply side (arms), abstracts the
complex interactions under partial information on matching platforms (e.g.
UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a
two-sided matching market where the demand side agents have unknown and
heterogeneous valuation over the supply side (arms), while the arms have known
uniform preference over the demand side (agents). We design the first
decentralized algorithm -- UCB with Decentralized Dominant-arm Deletion
(UCB-D3), for the agents, that does not require any knowledge of reward gaps or
time horizon. UCB-D3 works in phases, where in each phase, agents delete
\emph{dominated arms} -- the arms preferred by higher ranked agents, and play
only from the non-dominated arms according to the UCB. At the end of the phase,
agents broadcast in a decentralized fashion, their estimated preferred arms
through {\em pure exploitation}. We prove both, a new regret lower bound for
the decentralized serial dictatorship model, and that UCB-D3 is order optimal.
Related papers
- Competing Bandits in Decentralized Large Contextual Matching Markets [13.313881962771777]
We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for a large' supply side (aka arms)
Our proposed algorithms achieve instance-dependent logarithmic regret, scaling independently of the number of arms, $K$.
arXiv Detail & Related papers (2024-11-18T18:08:05Z) - Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.
This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Decentralized Competing Bandits in Non-Stationary Matching Markets [46.13741000158631]
We introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments.
We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (textttDNCB)
We characterize this emphforced exploration and obtain sub-linear (logarithmic) regret of textttDNCB.
arXiv Detail & Related papers (2022-05-31T21:05:30Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network.
In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps.
The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Distributed Cooperative Decision Making in Multi-agent Multi-armed
Bandits [6.437761597996503]
We study a distributed decision-making problem in which multiple agents face the same bandit (MAB)
We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm.
We show that both algorithms achieve group performance close to the performance of a central fusion center.
arXiv Detail & Related papers (2020-03-03T03:20:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.