Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators
- URL: http://arxiv.org/abs/2107.13346v1
- Date: Wed, 28 Jul 2021 13:21:27 GMT
- Title: Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators
- Authors: Alicia Curth and Mihaela van der Schaar
- Abstract summary: We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
- Score: 91.3755431537592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The machine learning toolbox for estimation of heterogeneous treatment
effects from observational data is expanding rapidly, yet many of its
algorithms have been evaluated only on a very limited set of semi-synthetic
benchmark datasets. In this paper, we show that even in arguably the simplest
setting -- estimation under ignorability assumptions -- the results of such
empirical evaluations can be misleading if (i) the assumptions underlying the
data-generating mechanisms in benchmark datasets and (ii) their interplay with
baseline algorithms are inadequately discussed. We consider two popular machine
learning benchmark datasets for evaluation of heterogeneous treatment effect
estimators -- the IHDP and ACIC2016 datasets -- in detail. We identify problems
with their current use and highlight that the inherent characteristics of the
benchmark datasets favor some algorithms over others -- a fact that is rarely
acknowledged but of immense relevance for interpretation of empirical results.
We close by discussing implications and possible next steps.
Related papers
- Targeted Machine Learning for Average Causal Effect Estimation Using the
Front-Door Functional [3.0232957374216953]
evaluating the average causal effect (ACE) of a treatment on an outcome often involves overcoming the challenges posed by confounding factors in observational studies.
Here, we introduce novel estimation strategies for the front-door criterion based on the targeted minimum loss-based estimation theory.
We demonstrate the applicability of these estimators to analyze the effect of early stage academic performance on future yearly income.
arXiv Detail & Related papers (2023-12-15T22:04:53Z) - Counterfactual Data Augmentation with Contrastive Learning [27.28511396131235]
We introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals.
We use contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes.
This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group.
arXiv Detail & Related papers (2023-11-07T00:36:51Z) - RCT Rejection Sampling for Causal Estimation Evaluation [25.845034753006367]
Confounding is a significant obstacle to unbiased estimation of causal effects from observational data.
We build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data.
We show our algorithm indeed results in low bias when oracle estimators are evaluated on confounded samples.
arXiv Detail & Related papers (2023-07-27T20:11:07Z) - B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under
Hidden Confounding [51.74479522965712]
We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on hidden confounding.
We prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods.
arXiv Detail & Related papers (2023-04-20T18:07:19Z) - Data-Driven Estimation of Heterogeneous Treatment Effects [15.140272661540655]
Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences.
We provide a survey of state-of-the-art data-driven methods for heterogeneous treatment effect estimation using machine learning.
arXiv Detail & Related papers (2023-01-16T21:36:49Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Improving Data-driven Heterogeneous Treatment Effect Estimation Under
Structure Uncertainty [13.452510519858995]
Estimating how a treatment affects units individually, known as heterogeneous treatment effect (HTE) estimation, is an essential part of decision-making and policy implementation.
We develop a feature selection method that considers each feature's value for HTE estimation and learns the relevant parts of the causal structure from data.
arXiv Detail & Related papers (2022-06-25T16:26:35Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.