Related papers: BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

URL: http://arxiv.org/abs/2209.08709v1
Date: Mon, 19 Sep 2022 01:51:12 GMT
Title: BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
Authors: Mao Ye, Bo Liu, Stephen Wright, Peter Stone and Qiang Liu
Abstract summary: Bilevel optimization (BO) is useful for solving a variety important machine learning problems. Conventional methods need to differentiate through the low-level optimization process with implicit differentiation. First-order BO depends only on first-order information, requires no implicit differentiation.
Score: 46.457298683984924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit differentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods proposed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that depends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.

Related papers

A Fully First-Order Layer for Differentiable Optimization [12.868783495046422]
Differentiable optimization layers enable learning systems to make decisions by solving embedded optimization problems.<n>We show that an approximate hyperient can be computed using only first-order information in $too(1)$ time.
arXiv Detail & Related papers (2025-12-02T07:36:03Z)
FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization [9.909119107223265]
First-order and second-order methods are in entirely different situations. This paper presents a novel method that leverages both the first-order and second-order methods in a unified algorithmic framework. FUSE-PV stands as a simple yet efficient optimization method involving a switch-over between first and second orders.
arXiv Detail & Related papers (2025-03-06T08:30:18Z)
Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable. It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Experience in Engineering Complex Systems: Active Preference Learning with Multiple Outcomes and Certainty Levels [1.5257326975704795]
Black-box optimization refers to the problem whose objective function and/or constraint sets are either unknown, inaccessible, or non-existent. The algorithm so-called Active Preference Learning has been developed to exploit this specific information. Our approach aims to extend the algorithm in such a way that can exploit further information effectively.
arXiv Detail & Related papers (2023-02-27T15:55:37Z)
On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks. We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications. We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z)
A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs [91.07247251502564]
We propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity.
arXiv Detail & Related papers (2021-06-09T09:18:18Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.