Random forests for binary geospatial data
- URL: http://arxiv.org/abs/2302.13828v1
- Date: Mon, 27 Feb 2023 14:34:33 GMT
- Title: Random forests for binary geospatial data
- Authors: Arkajyoti Saha and Abhirup Datta
- Abstract summary: We propose RF-GP, using Random Forests for estimating the non-linear covariate effect and Gaussian Processes for modeling the spatial random effects.
RF-GP outperforms existing RF methods for estimation and prediction in both simulated and real-world data.
We establish consistency of RF-GP for a general class of $beta$-mixing binary processes that includes common choices like spatial Mat'ern GP and autoregressive processes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary geospatial data is commonly analyzed with generalized linear mixed
models, specified with a linear fixed covariate effect and a Gaussian Process
(GP)-distributed spatial random effect, relating to the response via a link
function. The assumption of linear covariate effects is severely restrictive.
Random Forests (RF) are increasingly being used for non-linear modeling of
spatial data, but current extensions of RF for binary spatial data depart the
mixed model setup, relinquishing inference on the fixed effects and other
advantages of using GP. We propose RF-GP, using Random Forests for estimating
the non-linear covariate effect and Gaussian Processes for modeling the spatial
random effects directly within the generalized mixed model framework. We
observe and exploit equivalence of Gini impurity measure and least squares loss
to propose an extension of RF for binary data that accounts for the spatial
dependence. We then propose a novel link inversion algorithm that leverages the
properties of GP to estimate the covariate effects and offer spatial
predictions. RF-GP outperforms existing RF methods for estimation and
prediction in both simulated and real-world data. We establish consistency of
RF-GP for a general class of $\beta$-mixing binary processes that includes
common choices like spatial Mat\'ern GP and autoregressive processes.
Related papers
- Robust Gaussian Processes via Relevance Pursuit [17.39376866275623]
We propose and study a GP model that achieves robustness against sparse outliers by inferring data-point-specific noise levels.
We show, surprisingly, that the model can be parameterized such that the associated log marginal likelihood is strongly concave in the data-point-specific noise variances.
arXiv Detail & Related papers (2024-10-31T17:59:56Z) - Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting [4.675221539472143]
We propose a scalable inference algorithm for fitting sparse Gaussian process regression models with contaminated normal noise on large datasets.
We show that our approach yields shorter prediction intervals for similar coverage and accuracy when compared to an artificial dense neural network baseline.
arXiv Detail & Related papers (2024-02-27T15:08:57Z) - Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian
Processes [8.4159776055506]
We propose a novel approach for aggregating the Gaussian experts' predictions by Gaussian graphical model (GGM)
We first estimate the joint distribution of latent and observed variables using the Expectation-Maximization (EM) algorithm.
Our new method outperforms other state-of-the-art DGP approaches.
arXiv Detail & Related papers (2022-02-07T15:22:56Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Random Forests for dependent data [1.5469452301122173]
We propose RF-GLS, a novel extension of RF for dependent error processes.
The key to this extension is the equivalent representation of the local decision-making in a regression tree as a global OLS optimization.
We empirically demonstrate the improvement achieved by RF-GLS over RF for both estimation and prediction under dependence.
arXiv Detail & Related papers (2020-07-30T12:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.