Related papers: Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes

Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes

URL: http://arxiv.org/abs/2405.18328v1
Date: Tue, 28 May 2024 16:22:18 GMT
Title: Warm Start Marginal Likelihood Optimisation for Iterative Gaussian Processes
Authors: Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, José Miguel Hernández-Lobato,
Abstract summary: We introduce a three-level hierarchy of marginal likelihood optimisation for iterative Gaussian processes. We then propose to amortise computations by reusing solutions of linear system solvers as initialisations in the next step.
Score: 30.475300015723256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaussian processes are a versatile probabilistic machine learning model whose effectiveness often depends on good hyperparameters, which are typically learned by maximising the marginal likelihood. In this work, we consider iterative methods, which use iterative linear system solvers to approximate marginal likelihood gradients up to a specified numerical precision, allowing a trade-off between compute time and accuracy of a solution. We introduce a three-level hierarchy of marginal likelihood optimisation for iterative Gaussian processes, and identify that the computational costs are dominated by solving sequential batches of large positive-definite systems of linear equations. We then propose to amortise computations by reusing solutions of linear system solvers as initialisations in the next step, providing a $\textit{warm start}$. Finally, we discuss the necessary conditions and quantify the consequences of warm starts and demonstrate their effectiveness on regression tasks, where warm starts achieve the same results as the conventional procedure while providing up to a $16 \times$ average speed-up among datasets.

Related papers

Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes [31.305425519412758]
This paper focuses on iterative methods, which use linear system solvers, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers. These techniques provide speed-ups of up to $72times$ when solving to tolerance, and decrease the average residual norm by up to $7times$ when stopping early.
arXiv Detail & Related papers (2024-05-28T16:58:37Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE) VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step. In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z)
Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances [42.16343861129583]
We show that a bandit online learning algorithm can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $omega$ as the sequence length increases. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing.
arXiv Detail & Related papers (2023-10-03T17:51:42Z)
Neural incomplete factorization: learning preconditioners for the conjugate gradient method [2.899792823251184]
We develop a data-driven approach to accelerate the generation of effective preconditioners. We replace the typically hand-engineered preconditioners by the output of graph neural networks. Our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF)
arXiv Detail & Related papers (2023-05-25T11:45:46Z)
OKRidge: Scalable Optimal k-Sparse Ridge Regression [21.17964202317435]
We propose a fast algorithm, OKRidge, for sparse ridge regression. We also propose a method to warm-start our solver, which leverages a beam search.
arXiv Detail & Related papers (2023-04-13T17:34:44Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Safe Real-Time Optimization using Multi-Fidelity Gaussian Processes [0.0]
This paper proposes a new class of real-time optimization schemes to overcome system mismatch of uncertain processes. The proposed scheme uses two Gaussian processes for the system, one emulates the known process model, and another, the true system through measurements.
arXiv Detail & Related papers (2021-11-10T09:31:10Z)
Sparse Algorithms for Markovian Gaussian Processes [18.999495374836584]
Sparse Markovian processes combine the use of inducing variables with efficient Kalman filter-likes recursion. We derive a general site-based approach to approximate the non-Gaussian likelihood with local Gaussian terms, called sites. Our approach results in a suite of novel sparse extensions to algorithms from both the machine learning and signal processing, including variational inference, expectation propagation, and the classical nonlinear Kalman smoothers. The derived methods are suited to literature-temporal data, where the model has separate inducing points in both time and space.
arXiv Detail & Related papers (2021-03-19T09:50:53Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs) We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.