Related papers: ES-Based Jacobian Enables Faster Bilevel Optimization

ES-Based Jacobian Enables Faster Bilevel Optimization

URL: http://arxiv.org/abs/2110.07004v1
Date: Wed, 13 Oct 2021 19:36:50 GMT
Title: ES-Based Jacobian Enables Faster Bilevel Optimization
Authors: Daouda Sow, Kaiyi Ji, Yingbin Liang
Abstract summary: Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems. Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations. We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
Score: 53.675623215542515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems. However, due to the nested structure of BO, existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be very costly in practice, especially with large neural network models. In this work, we propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO, and hence fully eliminates all second-order computations. We call our algorithm as ESJ (which stands for the ES-based Jacobian method) and further extend it to the stochastic setting as ESJ-S. Theoretically, we characterize the convergence guarantee and computational complexity for our algorithms. Experimentally, we demonstrate the superiority of our proposed algorithms compared to the state of the art methods on various bilevel problems. Particularly, in our experiment in the few-shot meta-learning problem, we meta-learn the twelve millions parameters of a ResNet-12 network over the miniImageNet dataset, which evidently demonstrates the scalability of our ES-based bilevel approach and its feasibility in the large-scale setting.

Related papers

On The Sample Complexity Bounds In Bilevel Reinforcement Learning [36.239015146313136]
Bilevel reinforcement learning (BRL) has emerged as a powerful mathematical framework for studying generative AI alignment. We present the first sample complexity result for BRL, achieving a bound of $epsilon-4$. This result extends to standard bilevel optimization problems, providing an interesting theoretical contribution with practical implications.
arXiv Detail & Related papers (2025-03-22T04:22:04Z)
qNBO: quasi-Newton Meets Bilevel Optimization [26.0555315825777]
Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning.<n>We introduce a general framework to address these computational challenges in a coordinated manner.<n>Specifically, we leverage quasi-Newton algorithms to accelerate the resolution of the lower-level problem while efficiently approximating the inverse Hessian-vector product.
arXiv Detail & Related papers (2025-02-03T05:36:45Z)
Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization. We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate. We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z)
Ensemble-based Hybrid Optimization of Bayesian Neural Networks and Traditional Machine Learning Algorithms [0.0]
This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM) Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive definiteness of the Hessian matrix. Overall, the ensemble method stands out as a robust, algorithmically optimized approach.
arXiv Detail & Related papers (2023-10-09T06:59:17Z)
Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity [17.870370505179014]
We propose a novel decentralized bilevel optimization (DSBO) algorithm that only requires first order oracle, Hessian-vector product and Jacobian-vector product. The advantage of our algorithm is that it does not require estimating the full Hessian and Jacobian matrices, thereby having improved per-it complexity.
arXiv Detail & Related papers (2022-10-23T20:06:05Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis [12.680169619392695]
This thesis provides a comprehensive convergence rate analysis for bilevel optimization algorithms. For the problem-based formulation, we provide a convergence rate analysis for AID- and ITD-based bilevel algorithms. We then develop acceleration bilevel algorithms, for which we provide shaper convergence analysis with relaxed assumptions.
arXiv Detail & Related papers (2021-07-31T22:05:47Z)
Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications. We propose two new algorithms for bilevel optimization. We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z)
SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.