Related papers: Fast Approximate Multi-output Gaussian Processes

Fast Approximate Multi-output Gaussian Processes

URL: http://arxiv.org/abs/2008.09848v1
Date: Sat, 22 Aug 2020 14:34:45 GMT
Title: Fast Approximate Multi-output Gaussian Processes
Authors: Vladimir Joukov and Dana Kuli\'c
Abstract summary: Training with the proposed approach requires computing only a $N times n$ eigenfunction matrix and a $n times n$ inverse where $n$ is a selected number of eigenvalues. The proposed method can regress over multiple outputs, estimate the derivative of the regressor of any order, and learn the correlations between them.
Score: 6.6174748514131165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaussian processes regression models are an appealing machine learning method as they learn expressive non-linear models from exemplar data with minimal parameter tuning and estimate both the mean and covariance of unseen points. However, exponential computational complexity growth with the number of training samples has been a long standing challenge. During training, one has to compute and invert an $N \times N$ kernel matrix at every iteration. Regression requires computation of an $m \times N$ kernel where $N$ and $m$ are the number of training and test points respectively. In this work we show how approximating the covariance kernel using eigenvalues and functions leads to an approximate Gaussian process with significant reduction in training and regression complexity. Training with the proposed approach requires computing only a $N \times n$ eigenfunction matrix and a $n \times n$ inverse where $n$ is a selected number of eigenvalues. Furthermore, regression now only requires an $m \times n$ matrix. Finally, in a special case the hyperparameter optimization is completely independent form the number of training samples. The proposed method can regress over multiple outputs, estimate the derivative of the regressor of any order, and learn the correlations between them. The computational complexity reduction, regression capabilities, and multioutput correlation learning are demonstrated in simulation examples.

Related papers

Parallel simulation for sampling under isoperimetry and score-based diffusion models [56.39904484784127]
As data size grows, reducing the iteration cost becomes an important goal. Inspired by the success of the parallel simulation of the initial value problem in scientific computation, we propose parallel Picard methods for sampling tasks. Our work highlights the potential advantages of simulation methods in scientific computation for dynamics-based sampling and diffusion models.
arXiv Detail & Related papers (2024-12-10T11:50:46Z)
Transformer Neural Processes -- Kernel Regression [2.309018557701645]
We introduce the Transformer Neural Process - Kernel Regression (TNP-KR), a new architecture that incorporates a novel transformer block we call a Kernel Regression Block (BlockKR) In benchmarks spanning such tasks as meta-regression, Bayesian optimization, and image completion, we demonstrate that the full variant matches the performance of state-of-the-art methods while training faster and scaling two orders of magnitude higher in number of test points, and the fast variant nearly matches that performance while scaling to millions of both test and context points on consumer hardware.
arXiv Detail & Related papers (2024-11-19T13:40:49Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Scale invariant process regression [0.0]
We propose a novel regression method that does not require specification of a kernel, length scale, variance, nor prior mean. Experiments show that it is possible to derive a working machine learning method by assuming nothing but regularity and scale- and translation invariance.
arXiv Detail & Related papers (2022-08-22T17:32:33Z)
Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning [48.08663378234329]
Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications. Existing sparse approximation methods can yield a significant reduction in the computational cost. We provide novel confidence intervals for the Nystr"om method and the sparse variational Gaussian processes approximation method.
arXiv Detail & Related papers (2022-02-08T17:22:09Z)
Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithms [0.0]
We propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty. This result improves on the known complexity bound of $O(min(m2n,mn2)log (1/epsilon))$ for first-order optimization methods.
arXiv Detail & Related papers (2021-11-30T14:16:48Z)
Mixability made efficient: Fast online multiclass logistic regression [68.8204255655161]
We show that mixability can be a powerful tool to obtain algorithms with optimal regret. The resulting methods often suffer from high computational complexity which has reduced their practical applicability.
arXiv Detail & Related papers (2021-10-08T08:22:05Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Coordinate Methods for Matrix Games [41.821881312775496]
We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $min_x in mathcalX max_yinmathY ytop A x$. Our methods push existing sublinear methods towards their limits in terms of per-iteration complexity and sample complexity.
arXiv Detail & Related papers (2020-09-17T17:55:03Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Quadruply Stochastic Gaussian Processes [10.152838128195466]
We introduce a variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$. We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 107$ basis functions.
arXiv Detail & Related papers (2020-06-04T17:06:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.