No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling
- URL: http://arxiv.org/abs/2405.17463v1
- Date: Thu, 23 May 2024 08:21:48 GMT
- Title: No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling
- Authors: Ningyuan Chen, Xuefeng Gao, Yi Xiong,
- Abstract summary: We show that when the players use Thompson sampling, the game dynamics converge to the Nash equilibrium under a mild assumption on the payoff matrices.
algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies.
- Score: 10.376707874029561
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the payoff matrices. Therefore, algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies. To prove the convergence result, we find that the framework developed in stochastic approximation doesn't apply, because of the sporadic and infrequent updates of the inferior actions and the lack of Lipschitz continuity. We develop a novel sample-path-wise approach to show the convergence.
Related papers
- Convex Markov Games: A Framework for Fairness, Imitation, and Creativity in Multi-Agent Learning [31.958202912400925]
We introduce the class of convex Markov games that allow general convex preferences over occupancy measures.
Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist under strict convexity.
Our experiments imitate human choices in ultimatum games, reveal novel solutions to the repeated prisoner's dilemma, and find fair solutions in a repeated asymmetric coordination game.
arXiv Detail & Related papers (2024-10-22T00:55:04Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Scalable and Independent Learning of Nash Equilibrium Policies in
$n$-Player Stochastic Games with Unknown Independent Chains [1.0878040851638]
We study games with independent chains and unknown transition matrices.
In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players.
We propose a fully decentralized mirror descent algorithm to learn an $epsilon$-NE policy.
arXiv Detail & Related papers (2023-12-04T03:04:09Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - No-regret learning for repeated non-cooperative games with lossy bandits [5.437709019435926]
We study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss.
The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb)
arXiv Detail & Related papers (2022-05-14T05:02:56Z) - Mixed Nash Equilibria in the Adversarial Examples Game [18.181826693937776]
This paper tackles the problem of adversarial examples from a game theoretic point of view.
We study the open question of the existence of mixed Nash equilibria in the zero-sum game formed by the attacker and the classifier.
arXiv Detail & Related papers (2021-02-13T11:47:20Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - From Poincar\'e Recurrence to Convergence in Imperfect Information
Games: Finding Equilibrium via Regularization [49.368421783733815]
We show how adapting the reward can give strong convergence guarantees in monotone games.
We also show how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium.
arXiv Detail & Related papers (2020-02-19T21:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.