Related papers: Super-efficiency of automatic differentiation for functions defined as a minimum

Super-efficiency of automatic differentiation for functions defined as a minimum

URL: http://arxiv.org/abs/2002.03722v1
Date: Mon, 10 Feb 2020 13:23:01 GMT
Title: Super-efficiency of automatic differentiation for functions defined as a minimum
Authors: Pierre Ablin, Gabriel Peyr\'e and Thomas Moreau
Abstract summary: In min-min optimization, one has to compute the gradient of a function defined as a minimum. We study the error made by these estimators as a function of the optimization error.
Score: 16.02151272607889
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differentiation through the algorithm. In this paper, we study the asymptotic error made by these estimators as a function of the optimization error. We find that the error of the automatic estimator is close to the square of the error of the analytic estimator, reflecting a super-efficiency phenomenon. The convergence of the automatic estimator greatly depends on the convergence of the Jacobian of the algorithm. We analyze it for gradient descent and stochastic gradient descent and derive convergence rates for the estimators in these cases. Our analysis is backed by numerical experiments on toy problems and on Wasserstein barycenter computation. Finally, we discuss the computational complexity of these estimators and give practical guidelines to chose between them.

Related papers

Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition. We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z)
Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators. We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
One-step differentiation of iterative algorithms [7.9495796547433395]
We study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation. We provide a complete theoretical approximation analysis with specific examples along with its consequences in bilevel optimization.
arXiv Detail & Related papers (2023-05-23T07:32:37Z)
A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization [10.820943271350442]
We present a convex gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples. We also show that our algorithm avoids the ratelibria, implying convergence to local minima.
arXiv Detail & Related papers (2022-07-30T18:50:36Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
The estimation error of general first order methods [12.472245917779754]
We consider two families of estimation problems: high-dimensional regression and low-dimensional matrix estimation. We derive lower bounds the error that hold in the high-dimensional optimals in which both the number of observations and the number of parameters diverge. These lower bounds sense that there exist algorithms whose estimation error matches the lower bounds up to sparseally negligible terms.
arXiv Detail & Related papers (2020-02-28T18:13:47Z)
Stochastic Optimization for Regularized Wasserstein Estimators [10.194798773447879]
We introduce an algorithm to solve a regularized version of the problem of Wasserstein estimators gradient, with a time per step which is sublinear in the natural dimensions. We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters.
arXiv Detail & Related papers (2020-02-20T12:04:05Z)
Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients. We introduce a new sparsity operator: The random-top-k operator. Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.