Related papers: Scaling Gaussian Processes with Derivative Information Using Variational Inference

Scaling Gaussian Processes with Derivative Information Using Variational Inference

URL: http://arxiv.org/abs/2107.04061v1
Date: Thu, 8 Jul 2021 18:23:59 GMT
Title: Scaling Gaussian Processes with Derivative Information Using Variational Inference
Authors: Misha Padidar, Xinran Zhu, Leo Huang, Jacob R. Gardner, David Bindel
Abstract summary: We introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed.
Score: 17.746842802181256
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.

Related papers

Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\ exttt{D}}$ual-$\mathbf{\ exttt{H}}$ead $\mathbf{\ exttt{O}}$ptimization [49.2338910653152]
Vision-constrained models (VLMs) have achieved remarkable success across diverse tasks by leveraging rich textual information with minimal labeled data.<n> Knowledge distillation (KD) offers a well-established solution to this problem; however, recent KD approaches from VLMs often involve multi-stage training or additional tuning.<n>We propose $mathbftextttDHO$ -- a simple yet effective KD framework that transfers knowledge from VLMs to compact, task-specific models in semi-language settings.
arXiv Detail & Related papers (2025-05-12T15:39:51Z)
A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks [0.0]
$n$-TangentProp computes the exact derivative $dn/dxn f(x)$ in quasilinear, instead of exponential time. We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks.
arXiv Detail & Related papers (2024-12-12T22:57:28Z)
Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators [29.063441432499776]
We show how to efficiently perform arbitrary contraction of the derivative tensor of arbitrary order for multivariate functions. When applied to Physics-Informed Neural Networks (PINNs), our method provides >1000$times$ speed-up and. 30$times$ memory reduction over randomization with first-order AD.
arXiv Detail & Related papers (2024-11-27T09:37:33Z)
Integrated Variational Fourier Features for Fast Spatial Modelling with Gaussian Processes [7.5991638205413325]
For $N$ training points, exact inference has $O(N3)$ cost; with $M ll N$ features, state of the art sparse variational methods have $O(NM2)$ cost. Recently, methods have been proposed using more sophisticated features; these promise $O(M3)$ cost, with good performance in low dimensional tasks such as spatial modelling, but they only work with a very limited class of kernels, excluding some of the most commonly used. In this work, we propose integrated Fourier features, which extends these performance benefits to a very broad class of stationary co
arXiv Detail & Related papers (2023-08-27T15:44:28Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z)
Derivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning [3.7051887945349518]
We propose derivative-informed neural operators (DINOs) DINOs approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest. We show that the proposed DINO achieves significantly higher accuracy than neural operators trained without derivative information.
arXiv Detail & Related papers (2022-06-21T21:40:01Z)
Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression [41.48538038768993]
We focus on the problem of kernel ridge regression for dot-product kernels. We observe a peak in the learning curve whenever $m approx dr/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.
arXiv Detail & Related papers (2022-05-30T04:21:31Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias. We formalize the personalized collaborative learning problem as optimization of a user's objective. We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z)
Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z)
High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$. We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
$\pi$VAE: a stochastic process prior for Bayesian deep learning with MCMC [2.4792948967354236]
We propose a novel variational autoencoder called the prior encodingal autoencoder ($pi$VAE) We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference. Perhaps most usefully, we demonstrate that the low dimensional distributed latent space representation learnt provides an elegant and scalable means of performing inference for processes within programming languages such as Stan.
arXiv Detail & Related papers (2020-02-17T10:23:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.