Understanding algorithmic collusion with experience replay
- URL: http://arxiv.org/abs/2102.09139v1
- Date: Thu, 18 Feb 2021 03:28:41 GMT
- Title: Understanding algorithmic collusion with experience replay
- Authors: Bingyan Han
- Abstract summary: In an infinitely repeated pricing game, pricing algorithms based on artificial intelligence (Q-learning) may consistently learn to charge supra-competitive prices.
Although concerns on algorithmic collusion have arisen, little is known on underlying factors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In an infinitely repeated pricing game, pricing algorithms based on
artificial intelligence (Q-learning) may consistently learn to charge
supra-competitive prices even without communication. Although concerns on
algorithmic collusion have arisen, little is known on underlying factors. In
this work, we experimentally analyze the dynamics of algorithms with three
variants of experience replay. Algorithmic collusion still has roots in human
preferences. Randomizing experience yields prices close to the static Bertrand
equilibrium and higher prices are easily restored by favoring the latest
experience. Moreover, relative performance concerns also stabilize the
collusion. Finally, we investigate the scenarios with heterogeneous agents and
test robustness on various factors.
Related papers
- Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete? [0.0]
Algorithmic agents are used in a variety of competitive decision settings.
We study the behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing.
We show that these context-free bandits, with no knowledge of opponents' choices or outcomes, still will consistently learn collusive behavior.
arXiv Detail & Related papers (2024-11-25T16:58:07Z) - Artificial Intelligence and Algorithmic Price Collusion in Two-sided Markets [9.053163124987535]
We examine how AI agents using Q-learning engage in tacit collusion in two-sided markets.
Our experiments reveal that AI-driven platforms achieve higher collusion levels compared to Bertrand competition.
Increased network externalities significantly enhance collusion, suggesting AI algorithms exploit them to maximize profits.
arXiv Detail & Related papers (2024-07-04T17:57:56Z) - By Fair Means or Foul: Quantifying Collusion in a Market Simulation with Deep Reinforcement Learning [1.5249435285717095]
This research employs an experimental oligopoly model of repeated price competition.
We investigate the strategies and emerging pricing patterns developed by the agents, which may lead to a collusive outcome.
Our findings indicate that RL-based AI agents converge to a collusive state characterized by the charging of supracompetitive prices.
arXiv Detail & Related papers (2024-06-04T15:35:08Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Tacit algorithmic collusion in deep reinforcement learning guided price competition: A study using EV charge pricing game [0.0]
Players in pricing games with complex structures are increasingly adopting artificial intelligence (AI) aided learning algorithms.
Recent studies of games in canonical forms have shown contrasting claims ranging from none to a high level of tacit collusion.
We consider a practical game where EV charging hubs compete by dynamically varying their prices.
Results from our numerical case study yield collusion index values between 0.14 and 0.45, suggesting a low to moderate level of collusion.
arXiv Detail & Related papers (2024-01-25T16:51:52Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Online Search With Best-Price and Query-Based Predictions [2.3204178451683264]
We study learning-augmented algorithms in which there is a potentially erroneous prediction concerning the input.
We provide experimental results on data obtained from stock exchange markets.
arXiv Detail & Related papers (2021-12-02T20:18:37Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Revisiting Smoothed Online Learning [70.09792747315323]
We investigate the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost.
To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost.
arXiv Detail & Related papers (2021-02-13T14:15:55Z) - Adversarial Attacks on Linear Contextual Bandits [87.08004581867537]
Malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior.
We show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps.
We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context.
arXiv Detail & Related papers (2020-02-10T15:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.