ES-Based Jacobian Enables Faster Bilevel Optimization
- URL: http://arxiv.org/abs/2110.07004v1
- Date: Wed, 13 Oct 2021 19:36:50 GMT
- Title: ES-Based Jacobian Enables Faster Bilevel Optimization
- Authors: Daouda Sow, Kaiyi Ji, Yingbin Liang
- Abstract summary: Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems.
Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations.
We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
- Score: 53.675623215542515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilevel optimization (BO) has arisen as a powerful tool for solving many
modern machine learning problems. However, due to the nested structure of BO,
existing gradient-based methods require second-order derivative approximations
via Jacobian- or/and Hessian-vector computations, which can be very costly in
practice, especially with large neural network models. In this work, we propose
a novel BO algorithm, which adopts Evolution Strategies (ES) based method to
approximate the response Jacobian matrix in the hypergradient of BO, and hence
fully eliminates all second-order computations. We call our algorithm as ESJ
(which stands for the ES-based Jacobian method) and further extend it to the
stochastic setting as ESJ-S. Theoretically, we characterize the convergence
guarantee and computational complexity for our algorithms. Experimentally, we
demonstrate the superiority of our proposed algorithms compared to the state of
the art methods on various bilevel problems. Particularly, in our experiment in
the few-shot meta-learning problem, we meta-learn the twelve millions
parameters of a ResNet-12 network over the miniImageNet dataset, which
evidently demonstrates the scalability of our ES-based bilevel approach and its
feasibility in the large-scale setting.
Related papers
- Ensemble-based Hybrid Optimization of Bayesian Neural Networks and
Traditional Machine Learning Algorithms [0.0]
This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM)
Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive definiteness of the Hessian matrix.
Overall, the ensemble method stands out as a robust, algorithmically optimized approach.
arXiv Detail & Related papers (2023-10-09T06:59:17Z) - Decentralized Stochastic Bilevel Optimization with Improved
per-Iteration Complexity [17.870370505179014]
We propose a novel decentralized bilevel optimization (DSBO) algorithm that only requires first order oracle, Hessian-vector product and Jacobian-vector product.
The advantage of our algorithm is that it does not require estimating the full Hessian and Jacobian matrices, thereby having improved per-it complexity.
arXiv Detail & Related papers (2022-10-23T20:06:05Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Bilevel Optimization for Machine Learning: Algorithm Design and
Convergence Analysis [12.680169619392695]
This thesis provides a comprehensive convergence rate analysis for bilevel optimization algorithms.
For the problem-based formulation, we provide a convergence rate analysis for AID- and ITD-based bilevel algorithms.
We then develop acceleration bilevel algorithms, for which we provide shaper convergence analysis with relaxed assumptions.
arXiv Detail & Related papers (2021-07-31T22:05:47Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - SHINE: SHaring the INverse Estimate from the forward pass for bi-level
optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks.
The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix.
We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.