Regret, stability, and fairness in matching markets with bandit learners
- URL: http://arxiv.org/abs/2102.06246v1
- Date: Thu, 11 Feb 2021 20:18:12 GMT
- Title: Regret, stability, and fairness in matching markets with bandit learners
- Authors: Sarah H. Cen and Devavrat Shah
- Abstract summary: We consider the two-sided matching market with bandit learners.
We show that it is possible to simultaneously guarantee four desiderata: incentive compatibility, i.e., stability, low regret, i.e., $O(log(T))$ optimal regret, (3) fairness in the distribution of regret among agents, and (4) high social welfare.
- Score: 22.76773950247833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the two-sided matching market with bandit learners. In the
standard matching problem, users and providers are matched to ensure incentive
compatibility via the notion of stability. However, contrary to the core
assumption of the matching problem, users and providers do not know their true
preferences a priori and must learn them. To address this assumption, recent
works propose to blend the matching and multi-armed bandit problems. They
establish that it is possible to assign matchings that are stable (i.e.,
incentive-compatible) at every time step while also allowing agents to learn
enough so that the system converges to matchings that are stable under the
agents' true preferences. However, while some agents may incur low regret under
these matchings, others can incur high regret -- specifically, $\Omega(T)$
optimal regret where $T$ is the time horizon. In this work, we incorporate
costs and transfers in the two-sided matching market with bandit learners in
order to faithfully model competition between agents. We prove that, under our
framework, it is possible to simultaneously guarantee four desiderata: (1)
incentive compatibility, i.e., stability, (2) low regret, i.e., $O(\log(T))$
optimal regret, (3) fairness in the distribution of regret among agents, and
(4) high social welfare.
Related papers
- Putting Gale & Shapley to Work: Guaranteeing Stability Through Learning [14.448192914855674]
Two-sided matching markets describe a class of problems wherein participants from one side of the market must be matched to those from the other side according to their preferences.
We exploit the structure of stable solutions to devise algorithms that improve the likelihood of finding stable solutions.
arXiv Detail & Related papers (2024-10-06T06:47:53Z) - Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility [12.05174711411919]
Two-sided matching markets have been widely studied in the literature due to their rich applications.
We first extend an existing algorithm for the one-to-one setting to this more general setting and show it achieves a near-optimal bound for player-optimal regret.
We propose the adaptively explore-then-deferred (AETDA) algorithm for responsiveness setting and derive an upper bound for player-optimal stable regret.
arXiv Detail & Related papers (2024-01-03T03:45:35Z) - Player-optimal Stable Regret for Bandit Learning in Matching Markets [12.54215178882448]
We show that the optimal stable regret of each player can be upper bounded by $O(Klog T/Delta2)$ where $K$ is the number of arms, $T$ is the horizon and $Delta$ is the players' minimum preference gap among the first $N+1$ ranked arms.
Our work significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions.
arXiv Detail & Related papers (2023-07-20T14:10:33Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Decentralized Competing Bandits in Non-Stationary Matching Markets [46.13741000158631]
We introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments.
We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (textttDNCB)
We characterize this emphforced exploration and obtain sub-linear (logarithmic) regret of textttDNCB.
arXiv Detail & Related papers (2022-05-31T21:05:30Z) - Learn to Match with No Regret: Reinforcement Learning in Markov Matching
Markets [151.03738099494765]
We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market.
We propose a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching.
We prove that the algorithm achieves sublinear regret.
arXiv Detail & Related papers (2022-03-07T19:51:25Z) - Learning Equilibria in Matching Markets from Bandit Feedback [139.29934476625488]
We develop a framework and algorithms for learning stable market outcomes under uncertainty.
Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
arXiv Detail & Related papers (2021-08-19T17:59:28Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Learning Strategies in Decentralized Matching Markets under Uncertain
Preferences [91.3755431537592]
We study the problem of decision-making in the setting of a scarcity of shared resources when the preferences of agents are unknown a priori.
Our approach is based on the representation of preferences in a reproducing kernel Hilbert space.
We derive optimal strategies that maximize agents' expected payoffs.
arXiv Detail & Related papers (2020-10-29T03:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.