Trees-Based Models for Correlated Data
- URL: http://arxiv.org/abs/2102.08114v1
- Date: Tue, 16 Feb 2021 12:30:48 GMT
- Title: Trees-Based Models for Correlated Data
- Authors: Assaf Rabinowicz and Saharon Rosset
- Abstract summary: We show the problems that arise when implementing standard trees-based regression models, which ignore the correlation structure.
Our new approach explicitly takes the correlation structure into account in the splitting criterion.
The superiority of our new approach over trees-based models that do not account for the correlation is supported by simulation experiments and real data analyses.
- Score: 8.629912408966147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a new approach for trees-based regression, such as simple
regression tree, random forest and gradient boosting, in settings involving
correlated data. We show the problems that arise when implementing standard
trees-based regression models, which ignore the correlation structure. Our new
approach explicitly takes the correlation structure into account in the
splitting criterion, stopping rules and fitted values in the leaves, which
induces some major modifications of standard methodology. The superiority of
our new approach over trees-based models that do not account for the
correlation is supported by simulation experiments and real data analyses.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Ensembles of Probabilistic Regression Trees [46.53457774230618]
Tree-based ensemble methods have been successfully used for regression problems in many applications and research studies.
We study ensemble versions of probabilisticregression trees that provide smooth approximations of the objective function by assigningeach observation to each region with respect to a probability distribution.
arXiv Detail & Related papers (2024-06-20T06:51:51Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series
Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees.
We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves.
In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Inference for Network Regression Models with Community Structure [1.7188280334580197]
We present a novel regression modeling framework that models the errors as resulting from a community-based dependence structure.
We exploit the subsequent exchangeability properties of the error distribution to obtain parsimonious standard errors for regression parameters.
arXiv Detail & Related papers (2021-06-08T12:04:31Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Sparse learning with CART [18.351254916713305]
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology.
This paper aims to study the statistical properties of regression trees constructed with CART methodology.
arXiv Detail & Related papers (2020-06-07T20:55:52Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.