Decentralized Competing Bandits in Non-Stationary Matching Markets
- URL: http://arxiv.org/abs/2206.00120v1
- Date: Tue, 31 May 2022 21:05:30 GMT
- Title: Decentralized Competing Bandits in Non-Stationary Matching Markets
- Authors: Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran, Tara Javidi
and Arya Mazumdar
- Abstract summary: We introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments.
We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (textttDNCB)
We characterize this emphforced exploration and obtain sub-linear (logarithmic) regret of textttDNCB.
- Score: 46.13741000158631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding complex dynamics of two-sided online matching markets, where
the demand-side agents compete to match with the supply-side (arms), has
recently received substantial interest. To that end, in this paper, we
introduce the framework of decentralized two-sided matching market under non
stationary (dynamic) environments. We adhere to the serial dictatorship
setting, where the demand-side agents have unknown and different preferences
over the supply-side (arms), but the arms have fixed and known preference over
the agents. We propose and analyze a decentralized and asynchronous learning
algorithm, namely Decentralized Non-stationary Competing Bandits
(\texttt{DNCB}), where the agents play (restrictive) successive elimination
type learning algorithms to learn their preference over the arms. The
complexity in understanding such a system stems from the fact that the
competing bandits choose their actions in an asynchronous fashion, and the
lower ranked agents only get to learn from a set of arms, not \emph{dominated}
by the higher ranked agents, which leads to \emph{forced exploration}. With
carefully defined complexity parameters, we characterize this \emph{forced
exploration} and obtain sub-linear (logarithmic) regret of \texttt{DNCB}.
Furthermore, we validate our theoretical findings via experiments.
Related papers
- Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Decentralized, Communication- and Coordination-free Learning in
Structured Matching Markets [2.9833943723592764]
We study the problem of online learning in competitive settings in the context of two-sided matching markets.
We propose a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match.
arXiv Detail & Related papers (2022-06-06T04:08:04Z) - Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit.
In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine.
We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private.
Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z) - Learning Equilibria in Matching Markets from Bandit Feedback [139.29934476625488]
We develop a framework and algorithms for learning stable market outcomes under uncertainty.
Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
arXiv Detail & Related papers (2021-08-19T17:59:28Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Learning Strategies in Decentralized Matching Markets under Uncertain
Preferences [91.3755431537592]
We study the problem of decision-making in the setting of a scarcity of shared resources when the preferences of agents are unknown a priori.
Our approach is based on the representation of preferences in a reproducing kernel Hilbert space.
We derive optimal strategies that maximize agents' expected payoffs.
arXiv Detail & Related papers (2020-10-29T03:08:22Z) - Dominate or Delete: Decentralized Competing Bandits in Serial
Dictatorship [16.883188358641398]
We study a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms)
We design the first decentralized algorithm -- UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents.
We prove both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB-D3 is order optimal.
arXiv Detail & Related papers (2020-06-26T18:44:06Z) - Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards.
Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps.
We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.