Bandit Learning in Decentralized Matching Markets
- URL: http://arxiv.org/abs/2012.07348v3
- Date: Mon, 15 Feb 2021 03:55:05 GMT
- Title: Bandit Learning in Decentralized Matching Markets
- Authors: Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan
- Abstract summary: We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
- Score: 82.39061186055775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study two-sided matching markets in which one side of the market (the
players) does not have a priori knowledge about its preferences for the other
side (the arms) and is required to learn its preferences from experience. Also,
we assume the players have no direct means of communication. This model extends
the standard stochastic multi-armed bandit framework to a decentralized
multiple player setting with competition. We introduce a new algorithm for this
setting that, over a time horizon $T$, attains $\mathcal{O}(\log(T))$ stable
regret when preferences of the arms over players are shared, and
$\mathcal{O}(\log(T)^2)$ regret when there are no assumptions on the
preferences on either side. Moreover, in the setting where a single player may
deviate, we show that the algorithm is incentive compatible whenever the arms'
preferences are shared, but not necessarily so when preferences are fully
general.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets [7.512199306943756]
We study a decentralized two-sided matching market, where the demand-side (players) compete to match with the supply-side (arms)
We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (textttCA-ETC in short) for this problem.
We show that for the initial epoch length of $T_circ$ and subsequent epoch-lengths of $2l/gamma T_circ$, textttCA-ETC yields a player
arXiv Detail & Related papers (2024-08-16T12:06:09Z) - Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility [12.05174711411919]
Two-sided matching markets have been widely studied in the literature due to their rich applications.
We first extend an existing algorithm for the one-to-one setting to this more general setting and show it achieves a near-optimal bound for player-optimal regret.
We propose the adaptively explore-then-deferred (AETDA) algorithm for responsiveness setting and derive an upper bound for player-optimal stable regret.
arXiv Detail & Related papers (2024-01-03T03:45:35Z) - Player-optimal Stable Regret for Bandit Learning in Matching Markets [12.54215178882448]
We show that the optimal stable regret of each player can be upper bounded by $O(Klog T/Delta2)$ where $K$ is the number of arms, $T$ is the horizon and $Delta$ is the players' minimum preference gap among the first $N+1$ ranked arms.
Our work significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions.
arXiv Detail & Related papers (2023-07-20T14:10:33Z) - Competing Bandits in Time Varying Matching Markets [1.1470070927586014]
We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match.
We propose the it Restart Competing Bandits (RCB) algorithm, which combines a simple it restart strategy to handle the non-stationarity.
We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of $widetildemathcalO(L1/2_TT1/2)$ up to the number of changes in the underlying preference of agents
arXiv Detail & Related papers (2022-10-21T02:36:57Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Decentralized Competing Bandits in Non-Stationary Matching Markets [46.13741000158631]
We introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments.
We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (textttDNCB)
We characterize this emphforced exploration and obtain sub-linear (logarithmic) regret of textttDNCB.
arXiv Detail & Related papers (2022-05-31T21:05:30Z) - Learning Equilibria in Matching Markets from Bandit Feedback [139.29934476625488]
We develop a framework and algorithms for learning stable market outcomes under uncertainty.
Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
arXiv Detail & Related papers (2021-08-19T17:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.