Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility
- URL: http://arxiv.org/abs/2401.01528v2
- Date: Sun, 2 Jun 2024 03:26:45 GMT
- Title: Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility
- Authors: Fang Kong, Shuai Li,
- Abstract summary: Two-sided matching markets have been widely studied in the literature due to their rich applications.
We first extend an existing algorithm for the one-to-one setting to this more general setting and show it achieves a near-optimal bound for player-optimal regret.
We propose the adaptively explore-then-deferred (AETDA) algorithm for responsiveness setting and derive an upper bound for player-optimal stable regret.
- Score: 12.05174711411919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-sided matching markets have been widely studied in the literature due to their rich applications. Since participants are usually uncertain about their preferences, online algorithms have recently been adopted to learn them through iterative interactions. An existing work initiates the study of this problem in a many-to-one setting with responsiveness. However, their results are far from optimal and lack guarantees of incentive compatibility. We first extend an existing algorithm for the one-to-one setting to this more general setting and show it achieves a near-optimal bound for player-optimal regret. Nevertheless, due to the substantial requirement for collaboration, a single player's deviation could lead to a huge increase in its own cumulative rewards and a linear regret for others. In this paper, we aim to enhance the regret bound in many-to-one markets while ensuring incentive compatibility. We first propose the adaptively explore-then-deferred-acceptance (AETDA) algorithm for responsiveness setting and derive an upper bound for player-optimal stable regret while demonstrating its guarantee of incentive compatibility. To the best of our knowledge, it constitutes the first polynomial player-optimal guarantee in matching markets that offers such robust assurances without known $\Delta$, where $\Delta$ is some preference gap among players and arms. We also consider broader substitutable preferences, one of the most general conditions to ensure the existence of a stable matching and cover responsiveness. We devise an online DA (ODA) algorithm and establish an upper bound for the player-pessimal stable regret for this setting.
Related papers
- Putting Gale & Shapley to Work: Guaranteeing Stability Through Learning [14.448192914855674]
Two-sided matching markets describe a class of problems wherein participants from one side of the market must be matched to those from the other side according to their preferences.
We exploit the structure of stable solutions to devise algorithms that improve the likelihood of finding stable solutions.
arXiv Detail & Related papers (2024-10-06T06:47:53Z) - Incentive-compatible Bandits: Importance Weighting No More [14.344759978208957]
We study the problem of incentive-compatible online learning with bandit feedback.
In this work we propose the first incentive-compatible algorithms that enjoy $O(sqrtKT)$ regret bounds.
arXiv Detail & Related papers (2024-05-10T13:57:13Z) - Player-optimal Stable Regret for Bandit Learning in Matching Markets [12.54215178882448]
We show that the optimal stable regret of each player can be upper bounded by $O(Klog T/Delta2)$ where $K$ is the number of arms, $T$ is the horizon and $Delta$ is the players' minimum preference gap among the first $N+1$ ranked arms.
Our work significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions.
arXiv Detail & Related papers (2023-07-20T14:10:33Z) - Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints [13.069703665055084]
We propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints.
The presence of mixed quota and complementary preferences constraints can lead to instability in the matching process.
We formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling algorithm.
arXiv Detail & Related papers (2023-01-24T18:54:29Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Learning Equilibria in Matching Markets from Bandit Feedback [139.29934476625488]
We develop a framework and algorithms for learning stable market outcomes under uncertainty.
Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
arXiv Detail & Related papers (2021-08-19T17:59:28Z) - Regret, stability, and fairness in matching markets with bandit learners [22.76773950247833]
We consider the two-sided matching market with bandit learners.
We show that it is possible to simultaneously guarantee four desiderata: incentive compatibility, i.e., stability, low regret, i.e., $O(log(T))$ optimal regret, (3) fairness in the distribution of regret among agents, and (4) high social welfare.
arXiv Detail & Related papers (2021-02-11T20:18:12Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.