Convergence Properties of Stochastic Hypergradients
- URL: http://arxiv.org/abs/2011.07122v2
- Date: Mon, 12 Apr 2021 10:48:16 GMT
- Title: Convergence Properties of Stochastic Hypergradients
- Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo
- Abstract summary: We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset.
We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
- Score: 38.64355126221992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilevel optimization problems are receiving increasing attention in machine
learning as they provide a natural framework for hyperparameter optimization
and meta-learning. A key step to tackle these problems is the efficient
computation of the gradient of the upper-level objective (hypergradient). In
this work, we study stochastic approximation schemes for the hypergradient,
which are important when the lower-level problem is empirical risk minimization
on a large dataset. The method that we propose is a stochastic variant of the
approximate implicit differentiation approach in (Pedregosa, 2016). We provide
bounds for the mean square error of the hypergradient approximation, under the
assumption that the lower-level problem is accessible only through a stochastic
mapping which is a contraction in expectation. In particular, our main bound is
agnostic to the choice of the two stochastic solvers employed by the procedure.
We provide numerical experiments to support our theoretical analysis and to
show the advantage of using stochastic hypergradients in practice.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Analyzing Inexact Hypergradients for Bilevel Learning [0.09669369645900441]
We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation.
Our numerical results show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
arXiv Detail & Related papers (2023-01-11T23:54:27Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population.
We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z) - Inexact bilevel stochastic gradient methods for constrained and
unconstrained lower-level problems [0.0]
Two-level formula search optimization has become instrumental in a number of machine learning contexts.
New low-rank bi-level gradient methods are developed that do not require second-order derivatives.
arXiv Detail & Related papers (2021-10-01T18:20:14Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - On the Iteration Complexity of Hypergradient Computation [38.409444179509705]
In machine learning, the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly.
We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation.
This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best.
arXiv Detail & Related papers (2020-06-29T17:32:47Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.