Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium
- URL: http://arxiv.org/abs/2208.05363v1
- Date: Wed, 10 Aug 2022 14:21:54 GMT
- Title: Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium
- Authors: Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan
- Abstract summary: We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
- Score: 157.0902680672422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider learning Nash equilibria in two-player zero-sum Markov Games with
nonlinear function approximation, where the action-value function is
approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The
key challenge is how to do exploration in the high-dimensional function space.
We propose a novel online learning algorithm to find a Nash equilibrium by
minimizing the duality gap. At the core of our algorithms are upper and lower
confidence bounds that are derived based on the principle of optimism in the
face of uncertainty. We prove that our algorithm is able to attain an
$O(\sqrt{T})$ regret with polynomial computational complexity, under very mild
assumptions on the reward function and the underlying dynamic of the Markov
Games. We also propose several extensions of our algorithm, including an
algorithm with Bernstein-type bonus that can achieve a tighter regret bound,
and another algorithm for model misspecification that can be applied to neural
function approximation.
Related papers
- Two-Timescale Q-Learning with Function Approximation in Zero-Sum
Stochastic Games [31.554420227087043]
We propose a two-timescale smoothed $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players.
In two-timescale $Q$-learning, the fast-timescales are updated in spirit to the gradient descent and the slow-timescales are updated by taking a convex combination between its previous and the latest fast-timescale.
The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescales.
arXiv Detail & Related papers (2023-12-08T08:39:36Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games [16.09467599829253]
We study the problem of finding Nash equilibrium in a two-player zero-sum game.
Our main contribution is to show that under proper choices of the regularization parameter, the gradient descents to the Nash equilibrium of the original unregularized problem.
arXiv Detail & Related papers (2022-05-27T03:24:12Z) - Towards General Function Approximation in Zero-Sum Markov Games [126.58493169301012]
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves.
Provably efficient algorithms for both decoupled and coordinated settings are developed.
arXiv Detail & Related papers (2021-07-30T15:25:13Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.