Related papers: Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes

Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes

URL: http://arxiv.org/abs/2405.18457v3
Date: Thu, 31 Oct 2024 15:44:22 GMT
Title: Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes
Authors: Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, Javier Antorán, José Miguel Hernández-Lobato,
Abstract summary: This paper focuses on iterative methods, which use linear system solvers, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers. These techniques provide speed-ups of up to $72times$ when solving to tolerance, and decrease the average residual norm by up to $7times$ when stopping early.
Score: 31.305425519412758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community. This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions, (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias, (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps. These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.

Related papers

Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes [30.475300015723256]
We introduce a three-level hierarchy of marginal likelihood optimisation for iterative Gaussian processes. We then propose to amortise computations by reusing solutions of linear system solvers as initialisations in the next step.
arXiv Detail & Related papers (2024-05-28T16:22:18Z)
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs. USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
The Curse of Unrolling: Rate of Differentiating Through Optimization [35.327233435055305]
Un differentiation approximates the solution using an iterative solver and differentiates it through the computational path. We show that we can either 1) choose a large learning rate leading to a fast convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence.
arXiv Detail & Related papers (2022-09-27T09:27:29Z)
AI-enhanced iterative solvers for accelerating the solution of large scale parametrized linear systems of equations [0.0]
This paper exploits up-to-date ML tools and delivers customized iterative solvers of linear equation systems. The results indicate its superiority over conventional iterative solution schemes.
arXiv Detail & Related papers (2022-07-06T09:47:14Z)
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm. We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z)
Improving quantum linear system solvers via a gradient descent perspective [3.0969191504482247]
We revisit quantum linear system solvers from the perspective of convex optimization. This leads to a considerable constant-factor iteration in the runtime. We show how the optimal quantum linear system solver of Childs, Kothari, and Somma is related to the gradient descent algorithm.
arXiv Detail & Related papers (2021-09-09T13:16:28Z)
Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization [79.42778415729475]
We explore an alternative solution based on explicit memorization using linear autoencoders for sequences. We show how such pretraining can better support solving hard classification tasks with long sequences. We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
arXiv Detail & Related papers (2020-11-05T14:57:16Z)
Optimizing generalization on the train set: a novel gradient-based framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning. We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z)
Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates. In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.