An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation
- URL: http://arxiv.org/abs/2412.06436v1
- Date: Mon, 09 Dec 2024 12:26:26 GMT
- Title: An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation
- Authors: Lea Bogensperger, Matthias J. Ehrhardt, Thomas Pock, Mohammad Sadegh Salehi, Hok Shing Wong,
- Abstract summary: We consider a bilevel learning framework for learning linear operators.
In this framework, the learnable parameters are optimized via a loss function that also depends on ther of a convex optimization problem.
- Score: 8.107030227028815
- License:
- Abstract: We consider a bilevel learning framework for learning linear operators. In this framework, the learnable parameters are optimized via a loss function that also depends on the minimizer of a convex optimization problem (denoted lower-level problem). We utilize an iterative algorithm called `piggyback' to compute the gradient of the loss and minimizer of the lower-level problem. Given that the lower-level problem is solved numerically, the loss function and thus its gradient can only be computed inexactly. To estimate the accuracy of the computed hypergradient, we derive an a-posteriori error bound, which provides guides for setting the tolerance for the lower-level problem, as well as the piggyback algorithm. To efficiently solve the upper-level optimization, we also propose an adaptive method for choosing a suitable step-size. To illustrate the proposed method, we consider a few learned regularizer problems, such as training an input-convex neural network.
Related papers
- Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Analyzing Inexact Hypergradients for Bilevel Learning [0.09669369645900441]
We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation.
Our numerical results show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
arXiv Detail & Related papers (2023-01-11T23:54:27Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - A framework for bilevel optimization that enables stochastic and global variance reduction algorithms [21.67411847762289]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Bilevel learning of l1-regularizers with closed-form gradients(BLORC) [8.138650738423722]
We present a method for supervised learning of sparsity-promoting regularizers.
The parameters are learned to minimize the mean squared error of reconstruction on a training set of ground truth signal and measurement pairs.
arXiv Detail & Related papers (2021-11-21T17:01:29Z) - Inexact bilevel stochastic gradient methods for constrained and
unconstrained lower-level problems [0.0]
Two-level formula search optimization has become instrumental in a number of machine learning contexts.
New low-rank bi-level gradient methods are developed that do not require second-order derivatives.
arXiv Detail & Related papers (2021-10-01T18:20:14Z) - Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order
Information [37.70729542263343]
We present a novel adaptive optimization algorithm for large-scale machine learning problems.
Our method dynamically adapts the direction and step-size.
Our methodology does not require a tedious tuning rate tuning.
arXiv Detail & Related papers (2021-09-11T06:39:50Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.