Related papers: Geometry, Computation, and Optimality in Stochastic Optimization

Geometry, Computation, and Optimality in Stochastic Optimization

URL: http://arxiv.org/abs/1909.10455v3
Date: Fri, 11 Oct 2024 20:10:05 GMT
Title: Geometry, Computation, and Optimality in Stochastic Optimization
Authors: Chen Cheng, Daniel Levy, John C. Duchi,
Abstract summary: We study computational and statistical consequences of problem geometry in and online optimization. By focusing on constraint set and gradient geometry, we characterize the problem families for which- and adaptive-gradient methods are (minimax) optimal.
Score: 24.154336772159745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study computational and statistical consequences of problem geometry in stochastic and online optimization. By focusing on constraint set and gradient geometry, we characterize the problem families for which stochastic- and adaptive-gradient methods are (minimax) optimal and, conversely, when nonlinear updates -- such as those mirror descent employs -- are necessary for optimal convergence. When the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We provide quantitative converses showing that the ``distance'' of the underlying constraints from quadratic convexity determines the sub-optimality of subgradient methods. These results apply, for example, to any $\ell_p$-ball for $p < 2$, and the computation/accuracy tradeoffs they demonstrate exhibit a striking analogy to those in Gaussian sequence models.

Related papers

Zeroth-Order Optimization Finds Flat Minima [51.41529512093436]
We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian.<n>We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions.
arXiv Detail & Related papers (2025-06-05T17:59:09Z)
Meshless Shape Optimization using Neural Networks and Partial Differential Equations on Graphs [1.3812010983144802]
We present a fully meshless level set framework that leverages neural networks to parameterize the level set function and employs the graph Laplacian to approximate the underlying PDE. Our approach enables precise computations of geometric quantities such as surface normals and curvature, and allows tackling optimization problems within the class of convex shapes.
arXiv Detail & Related papers (2025-02-20T18:42:27Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality. This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function. To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z)
Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size [29.15132344744801]
We establish local convergence for gradient descent with adaptive step size for problems such as matrix inversion. We show that these first order optimization methods can achieve sub-linear or linear convergence.
arXiv Detail & Related papers (2021-12-30T00:50:30Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Accelerated, Optimal, and Parallel: Some Results on Model-Based Stochastic Optimization [33.71051480619541]
We extend the Approximate-Proximal Point (aProx) family of model-based methods for solving convex optimization problems. We provide non-asymptotic convergence guarantees and an acceleration scheme for which we provide linear speedup in minibatch size. We show improved convergence rates and matching lower bounds identifying new fundamental constants for "interpolation" problems.
arXiv Detail & Related papers (2021-01-07T18:58:39Z)
Constrained and Composite Optimization via Adaptive Sampling Methods [3.4219044933964944]
The motivation for this paper stems from the desire to develop an adaptive sampling method for solving constrained optimization problems. The method proposed in this paper is a proximal gradient method that can also be applied to the composite optimization problem min f(x) + h(x), where f is convex (but not necessarily differentiable)
arXiv Detail & Related papers (2020-12-31T02:50:39Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems. Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.