Gap-Dependent Bounds for Two-Player Markov Games
- URL: http://arxiv.org/abs/2107.00685v1
- Date: Thu, 1 Jul 2021 18:25:07 GMT
- Title: Gap-Dependent Bounds for Two-Player Markov Games
- Authors: Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S.Du
- Abstract summary: We analyze the cumulative regret when conducting Nash-learning algorithm on 2-player turn-based Markov games (2-TBSG)
This bound matches the theoretical lower bound only up to a logarithmic term.
We extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound.
- Score: 122.20541282562363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As one of the most popular methods in the field of reinforcement learning,
Q-learning has received increasing attention. Recently, there have been more
theoretical works on the regret bound of algorithms that belong to the
Q-learning class in different settings. In this paper, we analyze the
cumulative regret when conducting Nash Q-learning algorithm on 2-player
turn-based stochastic Markov games (2-TBSG), and propose the very first gap
dependent logarithmic upper bounds in the episodic tabular setting. This bound
matches the theoretical lower bound only up to a logarithmic term. Furthermore,
we extend the conclusion to the discounted game setting with infinite horizon
and propose a similar gap dependent logarithmic regret bound. Also, under the
linear MDP assumption, we obtain another logarithmic regret for 2-TBSG, in both
centralized and independent settings.
Related papers
- Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning [0.0]
Successive Over-Relaxation (SOR) Q-learning, which introduces a relaxation factor to speed up convergence, has two major limitations.
We propose a sample-based, model-free double SOR Q-learning algorithm.
The proposed algorithm is extended to large-scale problems using deep RL.
arXiv Detail & Related papers (2024-09-10T09:23:03Z) - Online Learning and Solving Infinite Games with an ERM Oracle [20.1330044382824]
We propose an algorithm for online binary classification setting that relies solely on ERM oracle calls.
We show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting.
Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
arXiv Detail & Related papers (2023-07-04T12:51:21Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Decentralized model-free reinforcement learning in stochastic games with
average-reward objective [1.9852463786440127]
We show that our algorithm achieves both sublinear high probability regret of order $T3/4$ and sublinear expected regret of order $T2/3$.
Our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works.
arXiv Detail & Related papers (2023-01-13T15:59:53Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Damped Online Newton Step for Portfolio Selection [96.0297968061824]
We revisit the classic online portfolio selection problem, where at each round a learner selects a distribution over a set of portfolios to allocate its wealth.
Existing algorithms that achieve a logarithmic regret for this problem have per-round time and space complexities that scale with the total number of rounds.
We present the first practical online portfolio selection algorithm with a logarithmic regret and whose per-round time and space complexities depend only logarithmically on the horizon.
arXiv Detail & Related papers (2022-02-15T17:01:55Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Near-Linear Time Algorithm with Near-Logarithmic Regret Per Switch for
Mixable/Exp-Concave Losses [0.0]
We study the online optimization of mixable loss functions with logarithmic static regret in a dynamic environment.
For the first time in literature, we show that it is also possible to achieve near-logarithmic regret per switch with sub-polynomial complexity per time.
arXiv Detail & Related papers (2021-09-28T15:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.