Non-convex online learning via algorithmic equivalence
- URL: http://arxiv.org/abs/2205.15235v1
- Date: Mon, 30 May 2022 16:50:34 GMT
- Title: Non-convex online learning via algorithmic equivalence
- Authors: Udaya Ghai, Zhou Lu, Elad Hazan
- Abstract summary: algorithmic equivalence technique non gradient descent convex mirror descent theory is presented.
Our analysis is based on a new simple algorithmic method prove an $frac23$.
- Score: 30.038975353298117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study an algorithmic equivalence technique between nonconvex gradient
descent and convex mirror descent. We start by looking at a harder problem of
regret minimization in online non-convex optimization. We show that under
certain geometric and smoothness conditions, online gradient descent applied to
non-convex functions is an approximation of online mirror descent applied to
convex functions under reparameterization. In continuous time, the gradient
flow with this reparameterization was shown to be exactly equivalent to
continuous-time mirror descent by Amid and Warmuth 2020, but theory for the
analogous discrete time algorithms is left as an open problem. We prove an
$O(T^{\frac{2}{3}})$ regret bound for non-convex online gradient descent in
this setting, answering this open problem. Our analysis is based on a new and
simple algorithmic equivalence method.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - A gradient estimator via L1-randomization for online zero-order
optimization with two point feedback [93.57603470949266]
We present a novel gradient estimator based on two function evaluation and randomization.
We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise.
We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem.
arXiv Detail & Related papers (2022-05-27T11:23:57Z) - Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization
under Infinite Noise Variance [17.199063087458907]
We quantify the convergence rate of the Mirror Descent algorithm with a class of uniformly convex mirror maps.
This algorithm does not require any explicit gradient clipping or normalization.
We complement our results with information-theoretic lower bounds showing that no other algorithm using only first-order oracles can achieve improved rates.
arXiv Detail & Related papers (2022-02-23T17:08:40Z) - Nearly Minimax-Optimal Rates for Noisy Sparse Phase Retrieval via
Early-Stopped Mirror Descent [29.206451882562867]
This paper studies early-stopped mirror descent applied to noisy noisy phase retrieval.
We find that a simple algorithm does not rely on explicit regularization or threshold steps to promote sparsity.
arXiv Detail & Related papers (2021-05-08T11:22:19Z) - A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval [24.17778927729799]
We analyze continuous-time mirror applied to sparse phase retrieval.
It is the problem of recovering sparse signals from a set of only measurements.
We provide a convergence analysis algorithm for this problem.
arXiv Detail & Related papers (2020-10-20T10:03:44Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - Geometry, Computation, and Optimality in Stochastic Optimization [24.154336772159745]
We study computational and statistical consequences of problem geometry in and online optimization.
By focusing on constraint set and gradient geometry, we characterize the problem families for which- and adaptive-gradient methods are (minimax) optimal.
arXiv Detail & Related papers (2019-09-23T16:14:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.