Related papers: Train simultaneously, generalize better: Stability of gradient-based minimax learners

Train simultaneously, generalize better: Stability of gradient-based minimax learners

URL: http://arxiv.org/abs/2010.12561v1
Date: Fri, 23 Oct 2020 17:44:43 GMT
Title: Train simultaneously, generalize better: Stability of gradient-based minimax learners
Authors: Farzan Farnia, Asuman Ozdaglar
Abstract summary: We show a key role in the performance of the trained minimax model under both convex concave and non-concave minimax settings. We discuss several numerical results indicating the role of optimization algorithms in the generalization of learned minimax models.
Score: 12.691047660244331
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustness properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model. To this end, we analyze the generalization properties of standard gradient descent ascent (GDA) and proximal point method (PPM) algorithms through the lens of algorithmic stability under both convex concave and non-convex non-concave minimax settings. While the GDA algorithm is not guaranteed to have a vanishing excess risk in convex concave problems, we show the PPM algorithm enjoys a bounded excess risk in the same setup. For non-convex non-concave problems, we compare the generalization performance of stochastic GDA and GDmax algorithms where the latter fully solves the maximization subproblem at every iteration. Our generalization analysis suggests the superiority of GDA provided that the minimization and maximization subproblems are solved simultaneously with similar learning rates. We discuss several numerical results indicating the role of optimization algorithms in the generalization of the learned minimax models.

Related papers

Towards Sharper Risk Bounds for Minimax Problems [23.380477456114118]
Minimax problems have achieved success in machine learning such as adversarial, robust optimization, reinforcement learning. For theoretical analysis, current optimal excess risk bounds are composed by generalization error and present 1/n-rates in strongly-strongly-concave (SC-SC) We analyze some popular algorithms such as empirical saddle point (GDA), gradient descent (DA) and gradient descent (SG) We derive n times faster than results in minimax problems.
arXiv Detail & Related papers (2024-10-11T03:50:23Z)
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm. Our results analyze the impact of different top factors on the generalization of D-SGDA. We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z)
Smoothed Analysis of Sequential Probability Assignment [16.090378928208885]
We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning.
arXiv Detail & Related papers (2023-03-08T19:25:57Z)
Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems. It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z)
Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications. In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z)
A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems [4.261680642170457]
We introduce proximal-proximal majorization-minimization (PPMM) algorithm for non regression problems. Our proposed algorithm outperforms the existing state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-25T15:07:13Z)
Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z)
Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification. We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems [39.13924898972611]
Non minimaxewicz problems appear frequently in emerging machine learning applications generative adversarial networks and adversarial learning. GDA algorithms with constant size can potentially diverge even in the convex setting. AGDA algorithm converges globally at a rate that attains a sub rate.
arXiv Detail & Related papers (2020-02-22T04:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.