Related papers: SDEs for Minimax Optimization

SDEs for Minimax Optimization

URL: http://arxiv.org/abs/2402.12508v1
Date: Mon, 19 Feb 2024 20:18:29 GMT
Title: SDEs for Minimax Optimization
Authors: Enea Monzio Compagnoni, Antonio Orvieto, Hans Kersting, Frank Norbert Proske, Aurelien Lucchi
Abstract summary: In this paper, we pioneer the use of differential equations (SDEs) to analyze and compare Minimax convergences. Our SDE models for Gradient Descent-Ascent, Extragradient, and Hamiltonian Gradient Descent are provable approximations of their algorithmic counterparts. This perspective also allows for a unified and simplified analysis strategy based on the principles of Ito calculus.
Score: 11.290653315174382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Minimax optimization problems have attracted a lot of attention over the past few years, with applications ranging from economics to machine learning. While advanced optimization methods exist for such problems, characterizing their dynamics in stochastic scenarios remains notably challenging. In this paper, we pioneer the use of stochastic differential equations (SDEs) to analyze and compare Minimax optimizers. Our SDE models for Stochastic Gradient Descent-Ascent, Stochastic Extragradient, and Stochastic Hamiltonian Gradient Descent are provable approximations of their algorithmic counterparts, clearly showcasing the interplay between hyperparameters, implicit regularization, and implicit curvature-induced noise. This perspective also allows for a unified and simplified analysis strategy based on the principles of It\^o calculus. Finally, our approach facilitates the derivation of convergence conditions and closed-form solutions for the dynamics in simplified settings, unveiling further insights into the behavior of different optimizers.

Related papers

Expanding the Chaos: Neural Operator for Stochastic (Partial) Differential Equations [65.80144621950981]
We build on Wiener chaos expansions (WCE) to design neural operator (NO) architectures for SPDEs and SDEs.<n>We show that WCE-based neural operators provide a practical and scalable way to learn SDE/SPDE solution operators.
arXiv Detail & Related papers (2026-01-03T00:59:25Z)
Understanding Lookahead Dynamics Through Laplace Transform [4.204990010424083]
We introduce a frequency-domain framework for convergence analysis of hyper parameters in games.<n>We transform the discrete-time dynamics of bilinear games into the frequency domain to derive precise convergence criteria.<n> Empirical validation in discrete-time settings demonstrates the effectiveness of our approach.
arXiv Detail & Related papers (2025-06-16T17:20:40Z)
Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian.<n>The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles [23.648702140754967]
We consider optimization when one only has to access biased oracles and obtain objective with low biases. We show that biased gradient methods can reduce variance in the non-varied regime. We also show that conditional optimization methods significantly improve best-known complexities in the literature for conditional optimization and risk optimization.
arXiv Detail & Related papers (2024-08-20T17:56:16Z)
Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z)
Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent [1.534667887016089]
We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow. We study the stability of the modified equation in the case of coordinate descent.
arXiv Detail & Related papers (2023-09-05T09:39:56Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution Differential Equations [83.3201889218775]
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. We adopt a framework studied in fluid dynamics to design differential equation models for several saddle-point optimization methods.
arXiv Detail & Related papers (2021-12-27T18:31:34Z)
Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications. Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z)
Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset. We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Geometry, Computation, and Optimality in Stochastic Optimization [24.154336772159745]
We study computational and statistical consequences of problem geometry in and online optimization. By focusing on constraint set and gradient geometry, we characterize the problem families for which- and adaptive-gradient methods are (minimax) optimal.
arXiv Detail & Related papers (2019-09-23T16:14:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.