Related papers: Taming GANs with Lookahead-Minmax

Taming GANs with Lookahead-Minmax

URL: http://arxiv.org/abs/2006.14567v3
Date: Wed, 23 Jun 2021 17:54:03 GMT
Title: Taming GANs with Lookahead-Minmax
Authors: Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois Fleuret, Martin Jaggi
Abstract summary: Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient. Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels.
Score: 63.90038365274479
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Adversarial Networks are notoriously challenging to train. The underlying minmax optimization is highly susceptible to the variance of the stochastic gradient and the rotational component of the associated game vector field. To tackle these challenges, we propose the Lookahead algorithm for minmax optimization, originally developed for single objective minimization only. The backtracking step of our Lookahead-minmax naturally handles the rotational game dynamics, a property which was identified to be key for enabling gradient ascent descent methods to converge on challenging examples often analyzed in the literature. Moreover, it implicitly handles high variance without using large mini-batches, known to be essential for reaching state of the art performance. Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient, in terms of performance and improved stability, for negligible memory and computational cost. Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels, bringing state-of-the-art GAN training within reach of common computational resources.

Related papers

ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes [81.48624894781257]
3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. We propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality.
arXiv Detail & Related papers (2025-03-21T18:55:14Z)
Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification. We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs. Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z)
Towards Sharper Risk Bounds for Minimax Problems [23.380477456114118]
Minimax problems have achieved success in machine learning such as adversarial, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds are composed by generalization error and present 1/n-rates in strongly-strongly-concave (SC-SC) We analyze some popular algorithms such as empirical saddle point (GDA), gradient descent (DA) and gradient descent (SG) We derive n times faster than results in minimax problems.
arXiv Detail & Related papers (2024-10-11T03:50:23Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z)
Structured Directional Pruning via Perturbation Orthogonal Projection [13.704348351073147]
A more reasonable approach is to find a sparse minimizer along the flat minimum valley found byNIST. We propose the structured directional pruning based on projecting the perturbations onto the flat minimum valley. Experiments show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining.
arXiv Detail & Related papers (2021-07-12T11:35:47Z)
Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications. Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z)
Sparse Attention with Linear Units [60.399814410157425]
We introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU. Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms. Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment.
arXiv Detail & Related papers (2021-04-14T17:52:38Z)
Direct-Search for a Class of Stochastic Min-Max Problems [0.0]
We investigate the use of derivative-search methods that only access objective through oracle. We prove convergence of this technique under mild assumptions. Our analysis is the first one to address the convergence of a direct-search method for minmax objectives in a setting.
arXiv Detail & Related papers (2021-02-22T22:23:58Z)
Avoiding local minima in variational quantum eigensolvers with the natural gradient optimizer [0.0]
We compare the BFGS, ADAM and Natural Gradient Descent (NatGrad) in the context of Variational Quantum Eigensolvers (VQEs) We analyze their performance on the QAOA ansatz for the Transverse Field Ising Model (TFIM) as well as on overparametrized circuits with the ability to break the symmetry of the Hamiltonian.
arXiv Detail & Related papers (2020-04-30T10:09:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.