Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels
- URL: http://arxiv.org/abs/2306.03968v1
- Date: Tue, 6 Jun 2023 19:02:57 GMT
- Title: Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels
- Authors: Alexander Immer, Tycho F. A. van der Ouderaa, Mark van der Wilk,
Gunnar R\"atsch, Bernhard Sch\"olkopf
- Abstract summary: We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
- Score: 78.6096486885658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selecting hyperparameters in deep learning greatly impacts its effectiveness
but requires manual effort and expertise. Recent works show that Bayesian model
selection with Laplace approximations can allow to optimize such
hyperparameters just like standard neural network parameters using gradients
and on the training data. However, estimating a single hyperparameter gradient
requires a pass through the entire dataset, limiting the scalability of such
algorithms. In this work, we overcome this issue by introducing lower bounds to
the linearized Laplace approximation of the marginal likelihood. In contrast to
previous estimators, these bounds are amenable to stochastic-gradient-based
optimization and allow to trade off estimation accuracy against computational
complexity. We derive them using the function-space form of the linearized
Laplace, which can be estimated using the neural tangent kernel.
Experimentally, we show that the estimators can significantly accelerate
gradient-based hyperparameter optimization.
Related papers
- Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Scalable Gaussian Process Hyperparameter Optimization via Coverage
Regularization [0.0]
We present a novel algorithm which estimates the smoothness and length-scale parameters in the Matern kernel in order to improve robustness of the resulting prediction uncertainties.
We achieve improved UQ over leave-one-out likelihood while maintaining a high degree of scalability as demonstrated in numerical experiments.
arXiv Detail & Related papers (2022-09-22T19:23:37Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset.
We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.