Mutation-Driven Follow the Regularized Leader for Last-Iterate
Convergence in Zero-Sum Games
- URL: http://arxiv.org/abs/2206.09254v1
- Date: Sat, 18 Jun 2022 17:32:07 GMT
- Title: Mutation-Driven Follow the Regularized Leader for Last-Iterate
Convergence in Zero-Sum Games
- Authors: Kenshi Abe, Mitsuki Sakamoto, Atsushi Iwasaki
- Abstract summary: Follow the Regularized Leader (FTRL) is guaranteed to converge to a Nash equilibrium when time-averaging the strategies.
We propose mutant FTRL (M-FTRL), an algorithm that introduces mutation for the perturbation of action probabilities.
- Score: 8.347058637480506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we consider a variant of the Follow the Regularized Leader
(FTRL) dynamics in two-player zero-sum games. FTRL is guaranteed to converge to
a Nash equilibrium when time-averaging the strategies, while a lot of variants
suffer from the issue of limit cycling behavior, i.e., lack the last-iterate
convergence guarantee. To this end, we propose mutant FTRL (M-FTRL), an
algorithm that introduces mutation for the perturbation of action
probabilities. We then investigate the continuous-time dynamics of M-FTRL and
provide the strong convergence guarantees toward stationary points that
approximate Nash equilibria under full-information feedback. Furthermore, our
simulation demonstrates that M-FTRL can enjoy faster convergence rates than
FTRL and optimistic FTRL under full-information feedback and surprisingly
exhibits clear convergence under bandit feedback.
Related papers
- The Power of Perturbation under Sampling in Solving Extensive-Form Games [56.013335390600524]
This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games.
Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium.
We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
arXiv Detail & Related papers (2025-01-28T00:29:38Z) - Continual SFT Matches Multimodal RLHF with Negative Supervision [32.784161582943874]
Multimodal RLHF usually happens after supervised finetuning (SFT) stage to continually improve vision-language models' (VLMs) comprehension.
Conventional wisdom holds its superiority over continual SFT during this preference alignment stage.
We propose a novel negative supervised finetuning (nSFT) approach that fully excavates these information resided.
arXiv Detail & Related papers (2024-11-22T08:48:30Z) - More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods.
We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z) - Generalized Implicit Follow-The-Regularized-Leader [15.974402990630402]
Generalized implicit FTRL can recover known algorithms, as FTRL with linearized losses and implicit FTRL.
We show the flexibility of the framework by proving that some known algorithms, like the Mirror-Prox updates, are instantiations of the generalized implicit FTRL.
arXiv Detail & Related papers (2023-05-31T21:39:52Z) - ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for
Last-Iterate Convergence in Constrained MDPs [31.663072540757643]
Reinforcement Learning (RL) has been applied to real-world problems with increasing success.
We introduce Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD)
arXiv Detail & Related papers (2023-02-02T18:05:27Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Maximum Entropy RL (Provably) Solves Some Robust RL Problems [94.80212602202518]
We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function.
Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
arXiv Detail & Related papers (2021-03-10T18:45:48Z) - Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic
Mean-Field Games [14.209473797379667]
We study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs)
Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon.
We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
arXiv Detail & Related papers (2020-09-09T15:17:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.