Treeging
- URL: http://arxiv.org/abs/2110.01053v1
- Date: Sun, 3 Oct 2021 17:48:18 GMT
- Title: Treeging
- Authors: Gregory L. Watson, Michael Jerrett, Colleen E. Reid, Donatello Telesca
- Abstract summary: Treeging combines the flexible mean structure of regression trees with the covariance-based prediction strategy of kriging into the base learner of an ensemble prediction algorithm.
We investigate the predictive accuracy of treeging across a thorough and widely varied battery of spatial and space-time simulation scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Treeging combines the flexible mean structure of regression trees with the
covariance-based prediction strategy of kriging into the base learner of an
ensemble prediction algorithm. In so doing, it combines the strengths of the
two primary types of spatial and space-time prediction models: (1) models with
flexible mean structures (often machine learning algorithms) that assume
independently distributed data, and (2) kriging or Gaussian Process (GP)
prediction models with rich covariance structures but simple mean structures.
We investigate the predictive accuracy of treeging across a thorough and widely
varied battery of spatial and space-time simulation scenarios, comparing it to
ordinary kriging, random forest and ensembles of ordinary kriging base
learners. Treeging performs well across the board, whereas kriging suffers when
dependence is weak or in the presence of spurious covariates, and random forest
suffers when the covariates are less informative. Treeging also outperforms
these competitors in predicting atmospheric pollutants (ozone and PM$_{2.5}$)
in several case studies. We examine sensitivity to tuning parameters (number of
base learners and training data sampling proportion), finding they follow the
familiar intuition of their random forest counterparts. We include a discussion
of scaleability, noting that any covariance approximation techniques that
expedite kriging (GP) may be similarly applied to expedite treeging.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Local Risk Bounds for Statistical Aggregation [5.940699390639279]
We prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator.
These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu'e and Rigollet for random design regression.
arXiv Detail & Related papers (2023-06-29T17:51:42Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - On Uncertainty Estimation by Tree-based Surrogate Models in Sequential
Model-based Optimization [13.52611859628841]
We revisit various ensembles of randomized trees to investigate their behavior in the perspective of prediction uncertainty estimation.
We propose a new way of constructing an ensemble of randomized trees, referred to as BwO forest, where bagging with oversampling is employed to construct bootstrapped samples.
Experimental results demonstrate the validity and good performance of BwO forest over existing tree-based models in various circumstances.
arXiv Detail & Related papers (2022-02-22T04:50:37Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Achieving Reliable Causal Inference with Data-Mined Variables: A Random
Forest Approach to the Measurement Error Problem [1.5749416770494704]
A common empirical strategy involves the application of predictive modeling techniques to'mine' variables of interest from available data.
Recent work highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables are likely to suffer from bias due to measurement error.
We propose a novel approach to mitigate these biases, leveraging the ensemble learning technique known as the random forest.
arXiv Detail & Related papers (2020-12-19T21:48:23Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.