What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?
- URL: http://arxiv.org/abs/2206.10323v2
- Date: Wed, 20 Dec 2023 14:20:51 GMT
- Title: What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?
- Authors: Susanne Dandl and Torsten Hothorn and Heidi Seibold and Erik Sverdrup
and Stefan Wager and Achim Zeileis
- Abstract summary: We show that both methods can be understood in terms of the same parameters and confounding assumptions under L2 loss.
In the randomized setting, both approaches performed akin to the new blended versions in a benchmark study.
- Score: 1.1050303097572156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimation of heterogeneous treatment effects (HTE) is of prime importance in
many disciplines, ranging from personalized medicine to economics among many
others. Random forests have been shown to be a flexible and powerful approach
to HTE estimation in both randomized trials and observational studies. In
particular "causal forests", introduced by Athey, Tibshirani and Wager (2019),
along with the R implementation in package grf were rapidly adopted. A related
approach, called "model-based forests", that is geared towards randomized
trials and simultaneously captures effects of both prognostic and predictive
variables, was introduced by Seibold, Zeileis and Hothorn (2018) along with a
modular implementation in the R package model4you.
Here, we present a unifying view that goes beyond the theoretical motivations
and investigates which computational elements make causal forests so successful
and how these can be blended with the strengths of model-based forests. To do
so, we show that both methods can be understood in terms of the same parameters
and model assumptions for an additive model under L2 loss. This theoretical
insight allows us to implement several flavors of "model-based causal forests"
and dissect their different elements in silico.
The original causal forests and model-based forests are compared with the new
blended versions in a benchmark study exploring both randomized trials and
observational settings. In the randomized setting, both approaches performed
akin. If confounding was present in the data generating process, we found local
centering of the treatment indicator with the corresponding propensities to be
the main driver for good performance. Local centering of the outcome was less
important, and might be replaced or enhanced by simultaneous split selection
with respect to both prognostic and predictive effects.
Related papers
- Exogenous Randomness Empowering Random Forests [4.396860522241306]
We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests.
Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees.
Our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests.
arXiv Detail & Related papers (2024-11-12T05:06:10Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Identifiable Latent Polynomial Causal Models Through the Lens of Change [82.14087963690561]
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data.
One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability.
arXiv Detail & Related papers (2023-10-24T07:46:10Z) - Heterogeneous Treatment Effect Estimation for Observational Data using
Model-based Forests [0.0]
We propose modifications to model-based forests to address the confounding issue in observational data.
We found that this strategy reduces confounding effects in a simulated study with various outcome distributions.
We demonstrate the practical aspects of HTE estimation for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect of Riluzole on the progress of Amyotrophic Lateral Sclerosis.
arXiv Detail & Related papers (2022-10-06T11:49:39Z) - FACT: High-Dimensional Random Forests Inference [4.941630596191806]
Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability.
Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias issue.
We propose a framework of the self-normalized feature-residual correlation test (FACT) for evaluating the significance of a given feature.
arXiv Detail & Related papers (2022-07-04T19:05:08Z) - Flexible Amortized Variational Inference in qBOLD MRI [56.4324135502282]
Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data.
Existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV.
This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV.
arXiv Detail & Related papers (2022-03-11T10:47:16Z) - On Uncertainty Estimation by Tree-based Surrogate Models in Sequential
Model-based Optimization [13.52611859628841]
We revisit various ensembles of randomized trees to investigate their behavior in the perspective of prediction uncertainty estimation.
We propose a new way of constructing an ensemble of randomized trees, referred to as BwO forest, where bagging with oversampling is employed to construct bootstrapped samples.
Experimental results demonstrate the validity and good performance of BwO forest over existing tree-based models in various circumstances.
arXiv Detail & Related papers (2022-02-22T04:50:37Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z) - Sparse Bayesian Causal Forests for Heterogeneous Treatment Effects
Estimation [0.0]
This paper develops a sparsity-inducing version of Bayesian Causal Forests.
It is designed to estimate heterogeneous treatment effects using observational data.
arXiv Detail & Related papers (2021-02-12T15:24:50Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.