Related papers: Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand

Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand

URL: http://arxiv.org/abs/2201.00486v1
Date: Mon, 3 Jan 2022 05:51:47 GMT
Title: Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand
Authors: Kshitija Taywade, Brent Harrison, Judy Goldsmith
Abstract summary: In this paper, we model repeated Cournot games with non-stationary demand. The set of arms/actions that an agent can choose from represents discrete production quantities. We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $epsilon$-greedy' which is remotely based on the well-known $epsilon$-greedy approach.
Score: 11.935419090901524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many past attempts at modeling repeated Cournot games assume that demand is stationary. This does not align with real-world scenarios in which market demands can evolve over a product's lifetime for a myriad of reasons. In this paper, we model repeated Cournot games with non-stationary demand such that firms/agents face separate instances of non-stationary multi-armed bandit problem. The set of arms/actions that an agent can choose from represents discrete production quantities; here, the action space is ordered. Agents are independent and autonomous, and cannot observe anything from the environment; they can only see their own rewards after taking an action, and only work towards maximizing these rewards. We propose a novel algorithm 'Adaptive with Weighted Exploration (AWE) $\epsilon$-greedy' which is remotely based on the well-known $\epsilon$-greedy approach. This algorithm detects and quantifies changes in rewards due to varying market demand and varies learning rate and exploration rate in proportion to the degree of changes in demand, thus enabling agents to better identify new optimal actions. For efficient exploration, it also deploys a mechanism for weighing actions that takes advantage of the ordered action space. We use simulations to study the emergence of various equilibria in the market. In addition, we study the scalability of our approach in terms number of total agents in the system and the size of action space. We consider both symmetric and asymmetric firms in our models. We found that using our proposed method, agents are able to swiftly change their course of action according to the changes in demand, and they also engage in collusive behavior in many simulations.

Related papers

Competing Bandits in Decentralized Large Contextual Matching Markets [13.313881962771777]
We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for a large' supply side (aka arms) Our proposed algorithms achieve instance-dependent logarithmic regret, scaling independently of the number of arms, $K$.
arXiv Detail & Related papers (2024-11-18T18:08:05Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
On Imperfect Recall in Multi-Agent Influence Diagrams [57.21088266396761]
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. We show how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium. We also describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
arXiv Detail & Related papers (2023-07-11T07:08:34Z)
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems. We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z)
Stochastic Market Games [10.979093424231532]
We propose to utilize market forces to provide incentives for agents to become cooperative. As demonstrated in an iterated version of the Prisoner's Dilemma, the proposed market formulation can change the dynamics of the game. We empirically find that the presence of markets can improve both the overall result and agent individual returns via their trading activities.
arXiv Detail & Related papers (2022-07-15T10:37:16Z)
Dynamic Memory for Interpretable Sequential Optimisation [0.0]
We present a solution to handling non-stationarity that is suitable for deployment at scale. We develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory. We describe the architecture of a large-scale deployment of automatic-as-a-service.
arXiv Detail & Related papers (2022-06-28T12:29:13Z)
Uplifting Bandits [23.262188897812475]
We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline.
arXiv Detail & Related papers (2022-06-08T18:00:56Z)
Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent. We propose a novel episodic memory, LeGEM, for model-free MARL algorithms. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2022-05-27T02:21:04Z)
Modelling Cournot Games as Multi-agent Multi-armed Bandits [4.751331778201811]
We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting for modeling repeated Cournot oligopoly games. We find that an $epsilon$-greedy approach offers a more viable learning mechanism than other traditional MAB approaches. We propose two novel approaches that take advantage of the ordered action space: $epsilon$-greedy+HL and $epsilon$-greedy+EL.
arXiv Detail & Related papers (2022-01-01T22:02:47Z)
Reinforcement Learning in Reward-Mixing MDPs [74.41782017817808]
episodic reinforcement learning in a reward-mixing Markov decision process (MDP) cdot S2 A2)$ episodes, where $H$ is time-horizon and $S, A$ are the number of states and actions respectively. epsilon$-optimal policy after exploring $tildeO(poly(H,epsilon-1) cdot S2 A2)$ episodes, where $H$ is time-horizon and $S, A$ are the number of states and actions respectively.
arXiv Detail & Related papers (2021-10-07T18:55:49Z)
Robust Allocations with Diversity Constraints [65.3799850959513]
We show that the Nash Welfare rule that maximizes product of agent values is uniquely positioned to be robust when diversity constraints are introduced. We also show that the guarantees achieved by Nash Welfare are nearly optimal within a widely studied class of allocation rules.
arXiv Detail & Related papers (2021-09-30T11:09:31Z)
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.