Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents
- URL: http://arxiv.org/abs/2505.22909v1
- Date: Wed, 28 May 2025 22:18:35 GMT
- Title: Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents
- Authors: Cristian Chica, Yinglong Guo, Gilad Lerman,
- Abstract summary: We provide the first theoretical explanation for this behavior in infinite repeated games.<n>We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, firms learn to consistently charge supracompetitive prices.
- Score: 9.053163124987535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is growing experimental evidence that $Q$-learning agents may learn to charge supracompetitive prices. We provide the first theoretical explanation for this behavior in infinite repeated games. Firms update their pricing policies based solely on observed profits, without computing equilibrium strategies. We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, and when the $Q$-function satisfies certain inequalities at the end of experimentation, firms learn to consistently charge supracompetitive prices. We introduce a new class of one-memory subgame perfect equilibria (SPEs) and provide conditions under which learned behavior is supported by naive collusion, grim trigger policies, or increasing strategies. Naive collusion does not constitute an SPE unless the collusive-enabling price is a one-stage Nash equilibrium, whereas grim trigger policies can.
Related papers
- Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions [24.586053819490985]
We consider price competition among multiple sellers over a selling horizon of $T$ periods.<n>Sellers simultaneously offer their prices and observe their respective demand that is unobservable to competitors.<n>We show that when all sellers employ our policy, their prices converge at a rate of $O(T-1/7)$ to the Nash equilibrium prices that sellers would reach if they were fully informed.
arXiv Detail & Related papers (2025-03-20T22:51:03Z) - Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound.<n>Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z) - On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
We study tractable $Phi$-equilibria in non-concave games.<n>We show that when $Phi$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Phi$-equilibria.
arXiv Detail & Related papers (2024-03-13T01:51:30Z) - Tacit algorithmic collusion in deep reinforcement learning guided price competition: A study using EV charge pricing game [0.0]
Players in pricing games with complex structures are increasingly adopting artificial intelligence (AI) aided learning algorithms.
Recent studies of games in canonical forms have shown contrasting claims ranging from none to a high level of tacit collusion.
We consider a practical game where EV charging hubs compete by dynamically varying their prices.
Results from our numerical case study yield collusion index values between 0.14 and 0.45, suggesting a low to moderate level of collusion.
arXiv Detail & Related papers (2024-01-25T16:51:52Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - $2 \ imes 2$ Zero-Sum Games with Commitments and Noisy Observations [1.9654639120238482]
The equilibrium of a $2times2$ zero-sum game is shown to always exist.
Observering the actions of the leader is shown to be either beneficial or immaterial for the follower.
The payoff at the equilibrium of this game is upper bounded by the payoff at the Stackelberg equilibrium (SE) in pure strategies.
arXiv Detail & Related papers (2022-11-03T10:56:00Z) - Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic
Games with Independent Chains [2.132096006921048]
We consider a class of $n$-player games in which players have their own internal state/action spaces while they are coupled through their payoff functions.
For this class of games, we first show that finding a stationary Nash (NE) policy without any assumption on the reward functions is interactable.
We develop algorithms based on dual averaging and dual mirror descent, which converge to the set of $epsilon$-NE policies.
arXiv Detail & Related papers (2022-01-28T16:27:21Z) - Understanding algorithmic collusion with experience replay [0.0]
In an infinitely repeated pricing game, pricing algorithms based on artificial intelligence (Q-learning) may consistently learn to charge supra-competitive prices.
Although concerns on algorithmic collusion have arisen, little is known on underlying factors.
arXiv Detail & Related papers (2021-02-18T03:28:41Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - No-regret learning and mixed Nash equilibria: They do not mix [64.37511607254115]
We study the dynamics of "follow-the-regularized-leader" (FTRL)
We show that any Nash equilibrium which is not strict cannot be stable and attracting under FTRL.
This result has significant implications for predicting the outcome of a learning process.
arXiv Detail & Related papers (2020-10-19T13:49:06Z) - Dropout as a Regularizer of Interaction Effects [76.84531978621143]
Dropout is a regularizer against higher-order interactions.
We prove this perspective analytically and empirically.
We also find that it is difficult to obtain the same selective pressure against high-order interactions.
arXiv Detail & Related papers (2020-07-02T01:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.