Related papers: A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

URL: http://arxiv.org/abs/2207.08894v1
Date: Mon, 18 Jul 2022 19:07:56 GMT
Title: A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games
Authors: Zihan Ding, Dijia Su, Qinghua Liu, Chi Jin
Abstract summary: This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Our objective is to find the Nash Equilibrium policies, which are free from exploitation by adversarial opponents.
Score: 35.35717637660101
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Our objective is to find the Nash Equilibrium policies, which are free from exploitation by adversarial opponents. Distinct from prior efforts on finding Nash equilibria in extensive-form games such as Poker, which feature tree-structured transition dynamics and discrete state space, this paper focuses on Markov games with general transition dynamics and continuous state space. We propose (1) Nash DQN algorithm, which integrates DQN with a Nash finding subroutine for the joint value functions; and (2) Nash DQN Exploiter algorithm, which additionally adopts an exploiter for guiding agent's exploration. Our algorithms are the practical variants of theoretical algorithms which are guaranteed to converge to Nash equilibria in the basic tabular setting. Experimental evaluation on both tabular examples and two-player Atari games demonstrates the robustness of the proposed algorithms against adversarial opponents, as well as their advantageous performance over existing methods.

Related papers

Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games [27.52590585111178]
We study potential games and Markov potential games under cost and bandit feedback. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T4/5)$ for potential games.
arXiv Detail & Related papers (2024-04-04T01:02:03Z)
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z)
Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability [11.793922711718645]
We consider decentralized learning for zero-sum games, where players only see their information and are to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based algorithm.
arXiv Detail & Related papers (2023-12-13T09:31:30Z)
Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z)
Efficiently Computing Nash Equilibria in Adversarial Team Markov Games [19.717850955051837]
We introduce a class of games in which a team identically players is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games potential games. Our main contribution is the first algorithm for computing stationary $epsilon$-approximate Nash equilibria in adversarial team Markov games.
arXiv Detail & Related papers (2022-08-03T16:41:01Z)
Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents. We focus on the solution concept of Nash equilibria (NE) We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z)
Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves. We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty" We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z)
Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games [37.70703888365849]
We study infinite-horizon discounted two-player zero-sum Markov games. We develop a decentralized algorithm that converges to the set of Nash equilibria under self-play.
arXiv Detail & Related papers (2021-02-08T21:45:56Z)
Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games. The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z)
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games. In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.