Fast variable selection makes scalable Gaussian process BSS-ANOVA a
speedy and accurate choice for tabular and time series regression
- URL: http://arxiv.org/abs/2205.13676v1
- Date: Thu, 26 May 2022 23:41:43 GMT
- Title: Fast variable selection makes scalable Gaussian process BSS-ANOVA a
speedy and accurate choice for tabular and time series regression
- Authors: David S. Mebane, Kyle Hayes and Ali Baheri
- Abstract summary: Gaussian processes (GPs) are non-parametric regression engines with a long history.
One of a number of scalable GP approaches is the Karhunen-Lo'eve (KL) decomposed kernel BSS-ANOVA, developed in 2009.
A new method of forward variable selection, quickly and effectively limits the number of terms, yielding a method with competitive accuracies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian processes (GPs) are non-parametric regression engines with a long
history. They are often overlooked in modern machine learning contexts because
of scalability issues: regression for traditional GP kernels are
$\mathcal{O}(N^3)$ where $N$ is the size of the dataset. One of a number of
scalable GP approaches is the Karhunen-Lo\'eve (KL) decomposed kernel
BSS-ANOVA, developed in 2009. It is $\mathcal{O}(NP)$ in training and
$\mathcal{O}(P)$ per point in prediction, where $P$ is the number of terms in
the ANOVA / KL expansion. A new method of forward variable selection, quickly
and effectively limits the number of terms, yielding a method with competitive
accuracies, training and inference times for large tabular datasets. The new
algorithm balances model fidelity with model complexity using Bayesian and
Akaike information criteria (BIC/AIC). The inference speed and accuracy makes
the method especially useful for modeling dynamic systems in a model-free
manner, by modeling the derivative in a dynamic system as a static problem,
then integrating the learned dynamics using a high-order scheme. The methods
are demonstrated on a `Susceptible, Infected, Recovered' (SIR) toy problem,
with the transmissibility used as forcing function, along with the `Cascaded
Tanks' benchmark dataset. Comparisons on the static prediction of derivatives
are made with a Random Forest and Residual Neural Network, while for the
timeseries prediction comparisons are made with LSTM and GRU recurrent neural
networks. The GP outperforms the other methods in all modeling tasks on
accuracy, while (in the case of the neural networks) performing many orders of
magnitude fewer calculations. For the SIR test, which involved prediction for a
set of forcing functions qualitatively different from those appearing in the
training set, the GP captured the correct dynamics while the neural networks
failed to do so.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators [78.64101336150419]
Predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling.
An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a temporalittext model.
We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation.
arXiv Detail & Related papers (2024-08-09T17:05:45Z) - Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features.
GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality.
We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO
Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model.
We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically.
Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Neural Jump Ordinary Differential Equations: Consistent Continuous-Time
Prediction and Filtering [6.445605125467574]
We introduce the Neural Jump ODE (NJ-ODE) that provides a data-driven approach to learn, continuously in time.
We show that our model converges to the $L2$-optimal online prediction.
We experimentally show that our model outperforms the baselines in more complex learning tasks.
arXiv Detail & Related papers (2020-06-08T16:34:51Z) - Deep Latent-Variable Kernel Learning [25.356503463916816]
We present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform encoding for regularized representation.
Experiments imply that the DLVKL-NSDE performs similarly to the well calibrated GP on small datasets, and outperforms existing deep GPs on large datasets.
arXiv Detail & Related papers (2020-05-18T05:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.