Scalable Bayesian Meta-Learning through Generalized Implicit Gradients
- URL: http://arxiv.org/abs/2303.17768v1
- Date: Fri, 31 Mar 2023 02:10:30 GMT
- Title: Scalable Bayesian Meta-Learning through Generalized Implicit Gradients
- Authors: Yilang Zhang, Bingcong Li, Shijian Gao, Georgios B. Giannakis
- Abstract summary: Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
- Score: 64.21628447579772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-learning owns unique effectiveness and swiftness in tackling emerging
tasks with limited data. Its broad applicability is revealed by viewing it as a
bi-level optimization problem. The resultant algorithmic viewpoint however,
faces scalability issues when the inner-level optimization relies on
gradient-based iterations. Implicit differentiation has been considered to
alleviate this challenge, but it is restricted to an isotropic Gaussian prior,
and only favors deterministic meta-learning approaches. This work markedly
mitigates the scalability bottleneck by cross-fertilizing the benefits of
implicit differentiation to probabilistic Bayesian meta-learning. The novel
implicit Bayesian meta-learning (iBaML) method not only broadens the scope of
learnable priors, but also quantifies the associated uncertainty. Furthermore,
the ultimate complexity is well controlled regardless of the inner-level
optimization trajectory. Analytical error bounds are established to demonstrate
the precision and efficiency of the generalized implicit gradient over the
explicit one. Extensive numerical tests are also carried out to empirically
validate the performance of the proposed method.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - A Learning-Based Optimal Uncertainty Quantification Method and Its
Application to Ballistic Impact Problems [1.713291434132985]
This paper concerns the optimal (supremum and infimum) uncertainty bounds for systems where the input (or prior) measure is only partially/imperfectly known.
We demonstrate the learning based framework on the uncertainty optimization problem.
We show that the approach can be used to construct maps for the performance certificate and safety in engineering practice.
arXiv Detail & Related papers (2022-12-28T14:30:53Z) - Self-adaptive algorithms for quasiconvex programming and applications to
machine learning [0.0]
We provide a self-adaptive step-size strategy that does not include convex line-search techniques and a generic approach under mild assumptions.
The proposed method is verified by preliminary results from some computational examples.
To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning.
arXiv Detail & Related papers (2022-12-13T05:30:29Z) - MARS: Meta-Learning as Score Matching in the Function Space [79.73213540203389]
We present a novel approach to extracting inductive biases from a set of related datasets.
We use functional Bayesian neural network inference, which views the prior as a process and performs inference in the function space.
Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the data-generating process.
arXiv Detail & Related papers (2022-10-24T15:14:26Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - From inexact optimization to learning via gradient concentration [22.152317081922437]
In this paper, we investigate the phenomenon in the context of linear models with smooth loss functions.
We propose a proof technique combining ideas from inexact optimization and probability theory, specifically gradient concentration.
arXiv Detail & Related papers (2021-06-09T21:23:29Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.