Fast variable selection makes scalable Gaussian process BSS-ANOVA a
speedy and accurate choice for tabular and time series regression
- URL: http://arxiv.org/abs/2205.13676v1
- Date: Thu, 26 May 2022 23:41:43 GMT
- Title: Fast variable selection makes scalable Gaussian process BSS-ANOVA a
speedy and accurate choice for tabular and time series regression
- Authors: David S. Mebane, Kyle Hayes and Ali Baheri
- Abstract summary: Gaussian processes (GPs) are non-parametric regression engines with a long history.
One of a number of scalable GP approaches is the Karhunen-Lo'eve (KL) decomposed kernel BSS-ANOVA, developed in 2009.
A new method of forward variable selection, quickly and effectively limits the number of terms, yielding a method with competitive accuracies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian processes (GPs) are non-parametric regression engines with a long
history. They are often overlooked in modern machine learning contexts because
of scalability issues: regression for traditional GP kernels are
$\mathcal{O}(N^3)$ where $N$ is the size of the dataset. One of a number of
scalable GP approaches is the Karhunen-Lo\'eve (KL) decomposed kernel
BSS-ANOVA, developed in 2009. It is $\mathcal{O}(NP)$ in training and
$\mathcal{O}(P)$ per point in prediction, where $P$ is the number of terms in
the ANOVA / KL expansion. A new method of forward variable selection, quickly
and effectively limits the number of terms, yielding a method with competitive
accuracies, training and inference times for large tabular datasets. The new
algorithm balances model fidelity with model complexity using Bayesian and
Akaike information criteria (BIC/AIC). The inference speed and accuracy makes
the method especially useful for modeling dynamic systems in a model-free
manner, by modeling the derivative in a dynamic system as a static problem,
then integrating the learned dynamics using a high-order scheme. The methods
are demonstrated on a `Susceptible, Infected, Recovered' (SIR) toy problem,
with the transmissibility used as forcing function, along with the `Cascaded
Tanks' benchmark dataset. Comparisons on the static prediction of derivatives
are made with a Random Forest and Residual Neural Network, while for the
timeseries prediction comparisons are made with LSTM and GRU recurrent neural
networks. The GP outperforms the other methods in all modeling tasks on
accuracy, while (in the case of the neural networks) performing many orders of
magnitude fewer calculations. For the SIR test, which involved prediction for a
set of forcing functions qualitatively different from those appearing in the
training set, the GP captured the correct dynamics while the neural networks
failed to do so.
Related papers
- Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features.
GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality.
We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z) - Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO
Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model.
We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically.
Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z) - Granger Causality using Neural Networks [8.835231777363399]
We present several new classes of models that can handle underlying non-linearity.
We show one can directly decouple lags and individual time series importance via decoupled penalties.
We also show one can directly decouple lags and individual time series importance via decoupled penalties.
arXiv Detail & Related papers (2022-08-07T12:02:48Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Neural Jump Ordinary Differential Equations: Consistent Continuous-Time
Prediction and Filtering [6.445605125467574]
We introduce the Neural Jump ODE (NJ-ODE) that provides a data-driven approach to learn, continuously in time.
We show that our model converges to the $L2$-optimal online prediction.
We experimentally show that our model outperforms the baselines in more complex learning tasks.
arXiv Detail & Related papers (2020-06-08T16:34:51Z) - Deep Latent-Variable Kernel Learning [25.356503463916816]
We present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform encoding for regularized representation.
Experiments imply that the DLVKL-NSDE performs similarly to the well calibrated GP on small datasets, and outperforms existing deep GPs on large datasets.
arXiv Detail & Related papers (2020-05-18T05:55:08Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.