Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method
- URL: http://arxiv.org/abs/2210.16482v1
- Date: Sat, 29 Oct 2022 03:43:59 GMT
- Title: Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method
- Authors: Zichu Liu, Lacra Pavel
- Abstract summary: generative adversarial networks (GANs) are notoriously challenging to train.
We propose a novel reasoning: Level $k$ Play (Lvv.k GP)
In contrast to many existing algorithms, our algorithm does not require sophisticateds or curvature information.
We achieve an FID of 10.17 for unconditional image generation within 30 hours, allowing GAN training on common computational resources to reach state-of-the-art performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the success of generative adversarial networks (GANs) in generating
visually appealing images, they are notoriously challenging to train. In order
to stabilize the learning dynamics in minimax games, we propose a novel
recursive reasoning algorithm: Level $k$ Gradient Play (Lv.$k$ GP) algorithm.
In contrast to many existing algorithms, our algorithm does not require
sophisticated heuristics or curvature information. We show that as $k$
increases, Lv.$k$ GP converges asymptotically towards an accurate estimation of
players' future strategy. Moreover, we justify that Lv.$\infty$ GP naturally
generalizes a line of provably convergent game dynamics which rely on
predictive updates. Furthermore, we provide its local convergence property in
nonconvex-nonconcave zero-sum games and global convergence in bilinear and
quadratic games. By combining Lv.$k$ GP with Adam optimizer, our algorithm
shows a clear advantage in terms of performance and computational overhead
compared to other methods. Using a single Nvidia RTX3090 GPU and 30 times fewer
parameters than BigGAN on CIFAR-10, we achieve an FID of 10.17 for
unconditional image generation within 30 hours, allowing GAN training on common
computational resources to reach state-of-the-art performance.
Related papers
- Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games [72.43065138581589]
We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
arXiv Detail & Related papers (2023-11-01T17:34:58Z) - Doubly Optimal No-Regret Learning in Monotone Games [10.760195409078591]
We propose the first doubly optimal no-regret learning algorithm for smooth monotone games.
Our algorithm achieves both (i) the optimal $O(sqrtT)$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(frac1T)$ last-iterate convergence rate to a Nash equilibrium.
arXiv Detail & Related papers (2023-01-30T17:55:53Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Tight last-iterate convergence rates for no-regret learning in
multi-player games [31.604950465209335]
We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O (1/sqrtT)$ in smooth monotone games.
We also show that the $O (1/sqrtT)$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.
arXiv Detail & Related papers (2020-10-26T17:06:19Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.