Scaling Gaussian Processes with Derivative Information Using Variational
Inference
- URL: http://arxiv.org/abs/2107.04061v1
- Date: Thu, 8 Jul 2021 18:23:59 GMT
- Title: Scaling Gaussian Processes with Derivative Information Using Variational
Inference
- Authors: Misha Padidar, Xinran Zhu, Leo Huang, Jacob R. Gardner, David Bindel
- Abstract summary: We introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference.
We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed.
- Score: 17.746842802181256
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Gaussian processes with derivative information are useful in many settings
where derivative information is available, including numerous Bayesian
optimization and regression tasks that arise in the natural sciences.
Incorporating derivative observations, however, comes with a dominating
$O(N^3D^3)$ computational cost when training on $N$ points in $D$ input
dimensions. This is intractable for even moderately sized problems. While
recent work has addressed this intractability in the low-$D$ setting, the
high-$N$, high-$D$ setting is still unexplored and of great value, particularly
as machine learning problems increasingly become high dimensional. In this
paper, we introduce methods to achieve fully scalable Gaussian process
regression with derivatives using variational inference. Analogous to the use
of inducing values to sparsify the labels of a training set, we introduce the
concept of inducing directional derivatives to sparsify the partial derivative
information of a training set. This enables us to construct a variational
posterior that incorporates derivative information but whose size depends
neither on the full dataset size $N$ nor the full dimensionality $D$. We
demonstrate the full scalability of our approach on a variety of tasks, ranging
from a high dimensional stellarator fusion regression task to training graph
convolutional neural networks on Pubmed using Bayesian optimization.
Surprisingly, we find that our approach can improve regression performance even
in settings where only label data is available.
Related papers
- A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks [0.0]
$n$-TangentProp computes the exact derivative $dn/dxn f(x)$ in quasilinear, instead of exponential time.
We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks.
arXiv Detail & Related papers (2024-12-12T22:57:28Z) - Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators [29.063441432499776]
We show how to efficiently perform arbitrary contraction of the derivative tensor of arbitrary order for multivariate functions.
When applied to Physics-Informed Neural Networks (PINNs), our method provides >1000$times$ speed-up and.
30$times$ memory reduction over randomization with first-order AD.
arXiv Detail & Related papers (2024-11-27T09:37:33Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Derivative-Informed Neural Operator: An Efficient Framework for
High-Dimensional Parametric Derivative Learning [3.7051887945349518]
We propose derivative-informed neural operators (DINOs)
DINOs approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest.
We show that the proposed DINO achieves significantly higher accuracy than neural operators trained without derivative information.
arXiv Detail & Related papers (2022-06-21T21:40:01Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias.
We formalize the personalized collaborative learning problem as optimization of a user's objective.
We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z) - Brain Image Synthesis with Unsupervised Multivariate Canonical
CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - $\pi$VAE: a stochastic process prior for Bayesian deep learning with
MCMC [2.4792948967354236]
We propose a novel variational autoencoder called the prior encodingal autoencoder ($pi$VAE)
We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference.
Perhaps most usefully, we demonstrate that the low dimensional distributed latent space representation learnt provides an elegant and scalable means of performing inference for processes within programming languages such as Stan.
arXiv Detail & Related papers (2020-02-17T10:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.