Boost-R: Gradient Boosted Trees for Recurrence Data
- URL: http://arxiv.org/abs/2107.08784v1
- Date: Sat, 3 Jul 2021 02:44:09 GMT
- Title: Boost-R: Gradient Boosted Trees for Recurrence Data
- Authors: Xiao Liu, Rong Pan
- Abstract summary: This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features.
Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity function of the recurrent event process.
- Score: 13.40931458200203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrence data arise from multi-disciplinary domains spanning reliability,
cyber security, healthcare, online retailing, etc. This paper investigates an
additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data),
for recurrent event data with both static and dynamic features. Boost-R
constructs an ensemble of gradient boosted additive trees to estimate the
cumulative intensity function of the recurrent event process, where a new tree
is added to the ensemble by minimizing the regularized L2 distance between the
observed and predicted cumulative intensity. Unlike conventional regression
trees, a time-dependent function is constructed by Boost-R on each tree leaf.
The sum of these functions, from multiple trees, yields the ensemble estimator
of the cumulative intensity. The divide-and-conquer nature of tree-based
methods is appealing when hidden sub-populations exist within a heterogeneous
population. The non-parametric nature of regression trees helps to avoid
parametric assumptions on the complex interactions between event processes and
features. Critical insights and advantages of Boost-R are investigated through
comprehensive numerical examples. Datasets and computer code of Boost-R are
made available on GitHub. To our best knowledge, Boost-R is the first gradient
boosted additive-tree-based approach for modeling large-scale recurrent event
data with both static and dynamic feature information.
Related papers
- Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models.
By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity.
In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Individualized and Global Feature Attributions for Gradient Boosted
Trees in the Presence of $\ell_2$ Regularization [0.0]
We propose Prediction Decomposition (PreDecomp), a novel individualized feature attribution for boosted trees when they are trained with $ell$ regularization.
We also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree.
arXiv Detail & Related papers (2022-11-08T17:56:22Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - To Boost or not to Boost: On the Limits of Boosted Neural Networks [67.67776094785363]
Boosting is a method for learning an ensemble of classifiers.
While boosting has been shown to be very effective for decision trees, its impact on neural networks has not been extensively studied.
We find that a single neural network usually generalizes better than a boosted ensemble of smaller neural networks with the same total number of parameters.
arXiv Detail & Related papers (2021-07-28T19:10:03Z) - Relational Boosted Regression Trees [1.14179290793997]
Many tasks use data housed in databases to train boosted regression tree models.
We give an adaptation of the greedyimation algorithm for training boosted regression trees.
arXiv Detail & Related papers (2021-07-25T20:29:28Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - agtboost: Adaptive and Automatic Gradient Tree Boosting Computations [0.0]
agtboost implements fast gradient tree boosting computations.
A useful model validation function performs the Kolmogorov-Smirnov test on the learned distribution.
arXiv Detail & Related papers (2020-08-28T12:42:19Z) - BoostTree and BoostForest for Ensemble Learning [27.911350375268576]
BoostForest is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression.
It generally outperformed four classical ensemble learning approaches (Random Forest, Extra-Trees, XGBoost and LightGBM) on 35 classification and regression datasets.
arXiv Detail & Related papers (2020-03-21T19:52:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.