Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes
- URL: http://arxiv.org/abs/2405.18328v1
- Date: Tue, 28 May 2024 16:22:18 GMT
- Title: Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes
- Authors: Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, José Miguel Hernández-Lobato,
- Abstract summary: We introduce a three-level hierarchy of marginal likelihood optimisation for iterative Gaussian processes.
We then propose to amortise computations by reusing solutions of linear system solvers as initialisations in the next step.
- Score: 30.475300015723256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian processes are a versatile probabilistic machine learning model whose effectiveness often depends on good hyperparameters, which are typically learned by maximising the marginal likelihood. In this work, we consider iterative methods, which use iterative linear system solvers to approximate marginal likelihood gradients up to a specified numerical precision, allowing a trade-off between compute time and accuracy of a solution. We introduce a three-level hierarchy of marginal likelihood optimisation for iterative Gaussian processes, and identify that the computational costs are dominated by solving sequential batches of large positive-definite systems of linear equations. We then propose to amortise computations by reusing solutions of linear system solvers as initialisations in the next step, providing a $\textit{warm start}$. Finally, we discuss the necessary conditions and quantify the consequences of warm starts and demonstrate their effectiveness on regression tasks, where warm starts achieve the same results as the conventional procedure while providing up to a $16 \times$ average speed-up among datasets.
Related papers
- Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes [31.305425519412758]
This paper focuses on iterative methods, which use linear system solvers, to construct an estimate of the marginal likelihood gradient.
We discuss three key improvements which are applicable across solvers.
These techniques provide speed-ups of up to $72times$ when solving to tolerance, and decrease the average residual norm by up to $7times$ when stopping early.
arXiv Detail & Related papers (2024-05-28T16:58:37Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances [42.16343861129583]
We show that a bandit online learning algorithm can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $omega$ as the sequence length increases.
Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing.
arXiv Detail & Related papers (2023-10-03T17:51:42Z) - Neural incomplete factorization: learning preconditioners for the conjugate gradient method [2.899792823251184]
We develop a data-driven approach to accelerate the generation of effective preconditioners.
We replace the typically hand-engineered preconditioners by the output of graph neural networks.
Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF)
arXiv Detail & Related papers (2023-05-25T11:45:46Z) - OKRidge: Scalable Optimal k-Sparse Ridge Regression [21.17964202317435]
We propose a fast algorithm, OKRidge, for sparse ridge regression.
We also propose a method to warm-start our solver, which leverages a beam search.
arXiv Detail & Related papers (2023-04-13T17:34:44Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Safe Real-Time Optimization using Multi-Fidelity Gaussian Processes [0.0]
This paper proposes a new class of real-time optimization schemes to overcome system mismatch of uncertain processes.
The proposed scheme uses two Gaussian processes for the system, one emulates the known process model, and another, the true system through measurements.
arXiv Detail & Related papers (2021-11-10T09:31:10Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.