Related papers: Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship

Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship

URL: http://arxiv.org/abs/2006.15166v2
Date: Fri, 12 Mar 2021 20:12:13 GMT
Title: Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship
Authors: Abishek Sankararaman, Soumya Basu, Karthik Abinav Sankararaman
Abstract summary: We study a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms) We design the first decentralized algorithm -- UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents. We prove both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB-D3 is order optimal.
Score: 16.883188358641398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms), while the arms have known uniform preference over the demand side (agents). We design the first decentralized algorithm -- UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon. UCB-D3 works in phases, where in each phase, agents delete \emph{dominated arms} -- the arms preferred by higher ranked agents, and play only from the non-dominated arms according to the UCB. At the end of the phase, agents broadcast in a decentralized fashion, their estimated preferred arms through {\em pure exploitation}. We prove both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB-D3 is order optimal.

Related papers

Heterogeneous Multi-Agent Bandits with Parsimonious Hints [12.709437751500353]
We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the $M$ agents has a unique reward distribution over $K$ arms, and in $T$ rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm. The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret.
arXiv Detail & Related papers (2025-02-22T07:46:41Z)
Competing Bandits in Decentralized Large Contextual Matching Markets [13.313881962771777]
We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for a large' supply side (aka arms) Our proposed algorithms achieve instance-dependent logarithmic regret, scaling independently of the number of arms, $K$.
arXiv Detail & Related papers (2024-11-18T18:08:05Z)
Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z)
Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Decentralized Competing Bandits in Non-Stationary Matching Markets [46.13741000158631]
We introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (textttDNCB) We characterize this emphforced exploration and obtain sub-linear (logarithmic) regret of textttDNCB.
arXiv Detail & Related papers (2022-05-31T21:05:30Z)
Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z)
Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits [16.038995442397972]
The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward distributions. Two fully decentralized upper confidence bound (UCB) algorithms are proposed for undirected graphs. The proposed UCB1 and KL-UCB algorithms permit each agent in the network to achieve a better logarithmic regret than their single-agent counterparts.
arXiv Detail & Related papers (2021-11-22T01:05:52Z)
Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition. We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z)
Bayesian Algorithms for Decentralized Stochastic Bandits [12.350564981588063]
We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps. The goal is to minimize cumulative regret averaged over the entire network.
arXiv Detail & Related papers (2020-10-20T19:14:20Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits [6.437761597996503]
We study a distributed decision-making problem in which multiple agents face the same bandit (MAB) We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We show that both algorithms achieve group performance close to the performance of a central fusion center.
arXiv Detail & Related papers (2020-03-03T03:20:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.