Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
- URL: http://arxiv.org/abs/2602.21620v1
- Date: Wed, 25 Feb 2026 06:32:23 GMT
- Title: Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
- Authors: Arnab Maiti, Junyan Liu, Kevin Jamieson, Lillian J. Ratliff,
- Abstract summary: We study the discrete Bertrand pricing game with a non-increasing demand function.<n>We analyze a repeated-game model in which firms set prices using no-regret learners.
- Score: 16.475146305668442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.
Related papers
- Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation [54.274028560515454]
We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved.<n>We introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret.<n>We show how to adaptively restart this algorithm with an appropriate $$ in response to the potential non-stationarity of the opponent.
arXiv Detail & Related papers (2026-02-06T21:25:54Z) - Adversarial Learning in Games with Bandit Feedback: Logarithmic Pure-Strategy Maximin Regret [64.73231630190121]
Learning to play zero-sum games is a fundamental problem in game theory and machine learning.<n>We investigate adversarial learning in zero-sum games under bandit feedback, aiming to minimize the deficit against the maximin pure strategy.<n>We show that the Tsallis-INF algorithm achieves $O(c log T)$ instance-dependent regret with a game-dependent parameter $c$.
arXiv Detail & Related papers (2026-02-06T03:26:01Z) - Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents [9.053163124987535]
We provide the first theoretical explanation for this behavior in infinite repeated games.<n>We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, firms learn to consistently charge supracompetitive prices.
arXiv Detail & Related papers (2025-05-28T22:18:35Z) - Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions [24.776097647623754]
We propose a dynamic pricing policy that uses semi-parametric least-squares estimation.<n>We show that when the sellers employ our policy, their prices converge at a rate of $O(T-1/7)$ to the Nash equilibrium prices that sellers would reach if they were fully informed.
arXiv Detail & Related papers (2025-03-20T22:51:03Z) - Swap Regret and Correlated Equilibria Beyond Normal-Form Games [62.01542145970044]
We present a new variant of swap regret for polytope games that we call profile swap regret''<n>Although we show profile swap regret is NP-hard to compute given a transcript of play, we show it is nonetheless possible to design efficient learning algorithms that guarantee at most $O(sqrtT)$ profile swap regret.
arXiv Detail & Related papers (2025-02-27T16:16:26Z) - Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound.<n>Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z) - Market Making without Regret [15.588799679661637]
We consider a sequential decision-making setting where, at every round $t$, a market maker posts a bid price $B_t$ and an ask price $A_t$ to an incoming trader.<n>If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs.<n>We characterize the maker's regret with respect to the best fixed choice of bid and ask pairs.
arXiv Detail & Related papers (2024-11-21T10:13:55Z) - Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
We prove lower bounds for computing a near-optimal $T$-sparse CCE.
In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
arXiv Detail & Related papers (2024-11-04T00:34:56Z) - From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces [26.81775688975904]
We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class.<n>Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria.
arXiv Detail & Related papers (2023-10-30T17:50:29Z) - Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z) - No-regret learning and mixed Nash equilibria: They do not mix [64.37511607254115]
We study the dynamics of "follow-the-regularized-leader" (FTRL)
We show that any Nash equilibrium which is not strict cannot be stable and attracting under FTRL.
This result has significant implications for predicting the outcome of a learning process.
arXiv Detail & Related papers (2020-10-19T13:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.