Computational Performance of Deep Reinforcement Learning to find Nash
Equilibria
- URL: http://arxiv.org/abs/2104.12895v1
- Date: Mon, 26 Apr 2021 22:14:17 GMT
- Title: Computational Performance of Deep Reinforcement Learning to find Nash
Equilibria
- Authors: Christoph Graf, Viktor Zobernig, Johannes Schmidt, Claude Kl\"ockl
- Abstract summary: We use a deep reinforcement learning algorithm to learn Nash equilibria in a setting where firms compete in prices.
Despite being model-free, a large set of parameters are utilized in various steps of the algorithm.
We find parameter choices that can reach convergence rates of up to 99%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We test the performance of deep deterministic policy gradient (DDPG), a deep
reinforcement learning algorithm, able to handle continuous state and action
spaces, to learn Nash equilibria in a setting where firms compete in prices.
These algorithms are typically considered model-free because they do not
require transition probability functions (as in e.g., Markov games) or
predefined functional forms. Despite being model-free, a large set of
parameters are utilized in various steps of the algorithm. These are e.g.,
learning rates, memory buffers, state-space dimensioning, normalizations, or
noise decay rates and the purpose of this work is to systematically test the
effect of these parameter configurations on convergence to the analytically
derived Bertrand equilibrium. We find parameter choices that can reach
convergence rates of up to 99%. The reliable convergence may make the method a
useful tool to study strategic behavior of firms even in more complex settings.
Keywords: Bertrand Equilibrium, Competition in Uniform Price Auctions, Deep
Deterministic Policy Gradient Algorithm, Parameter Sensitivity Analysis
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Improved Sample Complexity Bounds for Distributionally Robust
Reinforcement Learning [3.222802562733787]
We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment.
We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences.
We show that our algorithm achieves $tildemathcalO(|mathcalSmathcalA| H5)$ sample complexity, which is uniformly better than the existing results.
arXiv Detail & Related papers (2023-03-05T21:47:08Z) - Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning
with Parameter Convergence [18.412945308419033]
We investigate the global convergence of natural policy gradient approximation variants in multi-agent learning.
We propose an algorithm for several standard multi-agent learning scenarios.
arXiv Detail & Related papers (2022-10-23T18:27:04Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Gradient play in stochastic games: stationary points, convergence, and
sample complexity [6.97785632069611]
We study the performance of the gradient play algorithm for games (SGs)
We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting.
For a subclass of SGs called Markov potential games, we design a sample-based reinforcement learning algorithm.
arXiv Detail & Related papers (2021-06-01T03:03:45Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Adversarial Robustness Guarantees for Gaussian Processes [22.403365399119107]
Gaussian processes (GPs) enable principled computation of model uncertainty, making them attractive for safety-critical applications.
We present a framework to analyse adversarial robustness of GPs, defined as invariance of the model's decision to bounded perturbations.
We develop a branch-and-bound scheme to refine the bounds and show, for any $epsilon > 0$, that our algorithm is guaranteed to converge to values $epsilon$-close to the actual values in finitely many iterations.
arXiv Detail & Related papers (2021-04-07T15:14:56Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL)
We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA)
KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.