The Power of Perturbation under Sampling in Solving Extensive-Form Games
- URL: http://arxiv.org/abs/2501.16600v1
- Date: Tue, 28 Jan 2025 00:29:38 GMT
- Title: The Power of Perturbation under Sampling in Solving Extensive-Form Games
- Authors: Wataru Masaka, Mitsuki Sakamoto, Kenshi Abe, Kaito Ariu, Tuomas Sandholm, Atsushi Iwasaki,
- Abstract summary: This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games.
Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium.
We show that in the last-iterate sense, the FTRL consistently outperforms the non-samplinged FTRL.
- Score: 56.013335390600524
- License:
- Abstract: This paper investigates how perturbation does and does not improve the Follow-the-Regularized-Leader (FTRL) algorithm in imperfect-information extensive-form games. Perturbing the expected payoffs guarantees that the FTRL dynamics reach an approximate equilibrium, and proper adjustments of the magnitude of the perturbation lead to a Nash equilibrium (\textit{last-iterate convergence}). This approach is robust even when payoffs are estimated using sampling -- as is the case for large games -- while the optimistic approach often becomes unstable. Building upon those insights, we first develop a general framework for perturbed FTRL algorithms under \textit{sampling}. We then empirically show that in the last-iterate sense, the perturbed FTRL consistently outperforms the non-perturbed FTRL. We further identify a divergence function that reduces the variance of the estimates for perturbed payoffs, with which it significantly outperforms the prior algorithms on Leduc poker (whose structure is more asymmetric in a sense than that of the other benchmark games) and consistently performs smooth convergence behavior on all the benchmark games.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods.
We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z) - Mutation-Driven Follow the Regularized Leader for Last-Iterate
Convergence in Zero-Sum Games [8.347058637480506]
Follow the Regularized Leader (FTRL) is guaranteed to converge to a Nash equilibrium when time-averaging the strategies.
We propose mutant FTRL (M-FTRL), an algorithm that introduces mutation for the perturbation of action probabilities.
arXiv Detail & Related papers (2022-06-18T17:32:07Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Regularization Guarantees Generalization in Bayesian Reinforcement
Learning through Algorithmic Stability [48.62272919754204]
We study generalization in Bayesian RL under the probably approximately correct (PAC) framework.
Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense.
arXiv Detail & Related papers (2021-09-24T07:48:34Z) - Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z) - Towards Tractable Optimism in Model-Based Reinforcement Learning [37.51073590932658]
To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error)
We re-interpret these scalable optimistic model-based algorithms as solving a tractable noise augmented MDP.
We show that if this error is reduced, optimistic model-based RL algorithms can match state-of-the-art performance in continuous control problems.
arXiv Detail & Related papers (2020-06-21T20:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.