Modelling Cournot Games as Multi-agent Multi-armed Bandits
- URL: http://arxiv.org/abs/2201.01182v1
- Date: Sat, 1 Jan 2022 22:02:47 GMT
- Title: Modelling Cournot Games as Multi-agent Multi-armed Bandits
- Authors: Kshitija Taywade, Brent Harrison, Adib Bagh
- Abstract summary: We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting for modeling repeated Cournot oligopoly games.
We find that an $epsilon$-greedy approach offers a more viable learning mechanism than other traditional MAB approaches.
We propose two novel approaches that take advantage of the ordered action space: $epsilon$-greedy+HL and $epsilon$-greedy+EL.
- Score: 4.751331778201811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting
for modeling repeated Cournot oligopoly games, where the firms acting as agents
choose from the set of arms representing production quantity (a discrete
value). Agents interact with separate and independent bandit problems. In this
formulation, each agent makes sequential choices among arms to maximize its own
reward. Agents do not have any information about the environment; they can only
see their own rewards after taking an action. However, the market demand is a
stationary function of total industry output, and random entry or exit from the
market is not allowed. Given these assumptions, we found that an
$\epsilon$-greedy approach offers a more viable learning mechanism than other
traditional MAB approaches, as it does not require any additional knowledge of
the system to operate. We also propose two novel approaches that take advantage
of the ordered action space: $\epsilon$-greedy+HL and $\epsilon$-greedy+EL.
These new approaches help firms to focus on more profitable actions by
eliminating less profitable choices and hence are designed to optimize the
exploration. We use computer simulations to study the emergence of various
equilibria in the outcomes and do the empirical analysis of joint cumulative
regrets.
Related papers
- Competing Bandits in Decentralized Large Contextual Matching Markets [13.313881962771777]
We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for a large' supply side (aka arms)
Our proposed algorithms achieve instance-dependent logarithmic regret, scaling independently of the number of arms, $K$.
arXiv Detail & Related papers (2024-11-18T18:08:05Z) - Survival Multiarmed Bandits with Bootstrapping Methods [0.0]
The Survival Multiarmed Bandits (S-MAB) problem is an extension which constrains an agent to a budget related to observed rewards.
This paper presents a framework that addresses such a dual goal using an objective function balanced by a ruin aversion component.
arXiv Detail & Related papers (2024-10-21T20:21:10Z) - Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Principal-Agent Reward Shaping in MDPs [50.914110302917756]
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
arXiv Detail & Related papers (2023-12-30T18:30:44Z) - Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits
with Strategic Agents [57.627352949446625]
We consider a variant of the multi-armed bandit problem.
Specifically, the arms are strategic agents who can improve their rewards or absorb them.
We identify a class of MAB algorithms which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium.
arXiv Detail & Related papers (2023-12-13T06:54:49Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Multi-Agent Neural Rewriter for Vehicle Routing with Limited Disclosure
of Costs [65.23158435596518]
Solving the multi-vehicle routing problem as a team Markov game with partially observable costs.
Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions.
arXiv Detail & Related papers (2022-06-13T09:17:40Z) - Using Non-Stationary Bandits for Learning in Repeated Cournot Games with
Non-Stationary Demand [11.935419090901524]
In this paper, we model repeated Cournot games with non-stationary demand.
The set of arms/actions that an agent can choose from represents discrete production quantities.
We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $epsilon$-greedy' which is remotely based on the well-known $epsilon$-greedy approach.
arXiv Detail & Related papers (2022-01-03T05:51:47Z) - Incentivized Bandit Learning with Self-Reinforcing User Preferences [9.233886766950054]
We investigate a new multi-armed bandit (MAB) online learning model that considers real-world phenomena in many recommender systems.
We propose two MAB policies termed "At-Least-$n$ Explore-Then-Commit" and "UCB-List"
We prove that both policies achieve $O(log T)$ expected regret with $O(log T)$ expected payment over a time horizon $T$.
arXiv Detail & Related papers (2021-05-19T01:06:32Z) - Value Variance Minimization for Learning Approximate Equilibrium in
Aggregation Systems [8.140037969280716]
We consider the problem of learning approximate equilibrium solutions (win-win) in aggregation systems.
In this paper, we consider the problem of learning approximate equilibrium solutions (win-win) in aggregation systems so that individuals have an incentive to remain in the aggregation system.
arXiv Detail & Related papers (2020-03-16T10:02:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.