Taming GANs with Lookahead-Minmax
- URL: http://arxiv.org/abs/2006.14567v3
- Date: Wed, 23 Jun 2021 17:54:03 GMT
- Title: Taming GANs with Lookahead-Minmax
- Authors: Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois
Fleuret, Martin Jaggi
- Abstract summary: Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient.
Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels.
- Score: 63.90038365274479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks are notoriously challenging to train. The
underlying minmax optimization is highly susceptible to the variance of the
stochastic gradient and the rotational component of the associated game vector
field. To tackle these challenges, we propose the Lookahead algorithm for
minmax optimization, originally developed for single objective minimization
only. The backtracking step of our Lookahead-minmax naturally handles the
rotational game dynamics, a property which was identified to be key for
enabling gradient ascent descent methods to converge on challenging examples
often analyzed in the literature. Moreover, it implicitly handles high variance
without using large mini-batches, known to be essential for reaching state of
the art performance. Experimental results on MNIST, SVHN, CIFAR-10, and
ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam
or extragradient, in terms of performance and improved stability, for
negligible memory and computational cost. Using 30-fold fewer parameters and
16-fold smaller minibatches we outperform the reported performance of the
class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the
class labels, bringing state-of-the-art GAN training within reach of common
computational resources.
Related papers
- Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - Towards Sharper Risk Bounds for Minimax Problems [23.380477456114118]
Minimax problems have achieved success in machine learning such as adversarial, robust optimization, reinforcement learning.
For theoretical analysis, current optimal excess risk bounds are composed by generalization error and present 1/n-rates in strongly-strongly-concave (SC-SC)
We analyze some popular algorithms such as empirical saddle point (GDA), gradient descent (DA) and gradient descent (SG)
We derive n times faster than results in minimax problems.
arXiv Detail & Related papers (2024-10-11T03:50:23Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Structured Directional Pruning via Perturbation Orthogonal Projection [13.704348351073147]
A more reasonable approach is to find a sparse minimizer along the flat minimum valley found byNIST.
We propose the structured directional pruning based on projecting the perturbations onto the flat minimum valley.
Experiments show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining.
arXiv Detail & Related papers (2021-07-12T11:35:47Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Sparse Attention with Linear Units [60.399814410157425]
We introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU.
Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms.
Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment.
arXiv Detail & Related papers (2021-04-14T17:52:38Z) - Direct-Search for a Class of Stochastic Min-Max Problems [0.0]
We investigate the use of derivative-search methods that only access objective through oracle.
We prove convergence of this technique under mild assumptions.
Our analysis is the first one to address the convergence of a direct-search method for minmax objectives in a setting.
arXiv Detail & Related papers (2021-02-22T22:23:58Z) - Avoiding local minima in variational quantum eigensolvers with the
natural gradient optimizer [0.0]
We compare the BFGS, ADAM and Natural Gradient Descent (NatGrad) in the context of Variational Quantum Eigensolvers (VQEs)
We analyze their performance on the QAOA ansatz for the Transverse Field Ising Model (TFIM) as well as on overparametrized circuits with the ability to break the symmetry of the Hamiltonian.
arXiv Detail & Related papers (2020-04-30T10:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.