Uniform Stability for First-Order Empirical Risk Minimization
- URL: http://arxiv.org/abs/2207.08257v1
- Date: Sun, 17 Jul 2022 18:53:50 GMT
- Title: Uniform Stability for First-Order Empirical Risk Minimization
- Authors: Amit Attia and Tomer Koren
- Abstract summary: We consider the problem of designing uniformly stable first-order optimization algorithms for empirical risk minimization.
For Euclidean geometry, we suggest a black-box conversion which given a smooth optimization algorithm, produces a uniformly stable version of the algorithm while maintaining its convergence rate up to logarithmic factors.
For more general geometries, we develop a variant of Mirror Descent for smooth optimization with convergence rate $widetildeO (1/T)$ and uniform stability $O(T/n)$, leaving open the question of devising a general conversion method as in the Euclidean case.
- Score: 33.593203156666746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of designing uniformly stable first-order
optimization algorithms for empirical risk minimization. Uniform stability is
often used to obtain generalization error bounds for optimization algorithms,
and we are interested in a general approach to achieve it. For Euclidean
geometry, we suggest a black-box conversion which given a smooth optimization
algorithm, produces a uniformly stable version of the algorithm while
maintaining its convergence rate up to logarithmic factors. Using this
reduction we obtain a (nearly) optimal algorithm for smooth optimization with
convergence rate $\widetilde{O}(1/T^2)$ and uniform stability $O(T^2/n)$,
resolving an open problem of Chen et al. (2018); Attia and Koren (2021). For
more general geometries, we develop a variant of Mirror Descent for smooth
optimization with convergence rate $\widetilde{O}(1/T)$ and uniform stability
$O(T/n)$, leaving open the question of devising a general conversion method as
in the Euclidean case.
Related papers
- A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm called ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
Experimental results on GDRO and partial Area Under the ROC Curve for cFCCO demonstrate the promising performance of our algorithm.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Strictly Low Rank Constraint Optimization -- An Asymptotically
$\mathcal{O}(\frac{1}{t^2})$ Method [5.770309971945476]
We propose a class of non-text and non-smooth problems with textitrank regularization to promote sparsity in optimal solution.
We show that our algorithms are able to achieve a singular convergence of $Ofrac(t2)$, which is exactly same as Nesterov's optimal convergence for first-order methods on smooth convex problems.
arXiv Detail & Related papers (2023-07-04T16:55:41Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Adaptive extra-gradient methods for min-max optimization and games [35.02879452114223]
We present a new family of minmax optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations.
Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not.
It converges to an $varepsilon$-optimal solution within $mathcalO (1/varepsilon)$ iterations in smooth problems, and within $mathcalO (1/varepsilon)$ iterations in non-smooth ones.
arXiv Detail & Related papers (2020-10-22T22:54:54Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.