Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets
- URL: http://arxiv.org/abs/2408.08690v1
- Date: Fri, 16 Aug 2024 12:06:09 GMT
- Title: Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets
- Authors: Tejas Pagare, Avishek Ghosh,
- Abstract summary: We study a decentralized two-sided matching market, where the demand-side (players) compete to match with the supply-side (arms)
We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (textttCA-ETC in short) for this problem.
We show that for the initial epoch length of $T_circ$ and subsequent epoch-lengths of $2l/gamma T_circ$, textttCA-ETC yields a player
- Score: 7.512199306943756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online learning in a decentralized two-sided matching markets, where the demand-side (players) compete to match with the supply-side (arms), has received substantial interest because it abstracts out the complex interactions in matching platforms (e.g. UpWork, TaskRabbit). However, past works assume that each arm knows their preference ranking over the players (one-sided learning), and each player aim to learn the preference over arms through successive interactions. Moreover, several (impractical) assumptions on the problem are usually made for theoretical tractability such as broadcast player-arm match Liu et al. (2020; 2021); Kong & Li (2023) or serial dictatorship Sankararaman et al. (2021); Basu et al. (2021); Ghosh et al. (2022). In this paper, we study a decentralized two-sided matching market, where we do not assume that the preference ranking over players are known to the arms apriori. Furthermore, we do not have any structural assumptions on the problem. We propose a multi-phase explore-then-commit type algorithm namely epoch-based CA-ETC (collision avoidance explore then commit) (\texttt{CA-ETC} in short) for this problem that does not require any communication across agents (players and arms) and hence decentralized. We show that for the initial epoch length of $T_{\circ}$ and subsequent epoch-lengths of $2^{l/\gamma} T_{\circ}$ (for the $l-$th epoch with $\gamma \in (0,1)$ as an input parameter to the algorithm), \texttt{CA-ETC} yields a player optimal expected regret of $\mathcal{O}\left(T_{\circ} (\frac{K \log T}{T_{\circ} \Delta^2})^{1/\gamma} + T_{\circ} (\frac{T}{T_{\circ}})^\gamma\right)$ for the $i$-th player, where $T$ is the learning horizon, $K$ is the number of arms and $\Delta$ is an appropriately defined problem gap. Furthermore, we propose a blackboard communication based baseline achieving logarithmic regret in $T$.
Related papers
- Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - $\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
We show that an optimistic-follow-the-regularized-leader algorithm can find $widetildeO(T-1)$-approximate iterations in full-information general-sum Markov games within $T$.
arXiv Detail & Related papers (2024-02-02T20:40:27Z) - Constant or logarithmic regret in asynchronous multiplayer bandits [22.376992460581594]
A multiplayer bandits problem was first tackled with an explore-then-commit (ETC) algorithm.
We introduce Cautious Greedy, an algorithm that yields constant instance-dependent regret if the optimal policy assigns at least one player on each arm.
arXiv Detail & Related papers (2023-05-31T09:35:03Z) - Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games
with Bandit Feedback [49.1061436241109]
We focus on developing an algorithm that is uncoupled, convergent, and rational, with non-asymptotic convergence rates.
Our algorithm is related to [Chen et al., 2021, Cen et al., 2021] and also builds on the entropy regularization technique.
arXiv Detail & Related papers (2023-03-05T18:08:54Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision.
We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.