A Large Scale Benchmark for Individual Treatment Effect Prediction and
Uplift Modeling
- URL: http://arxiv.org/abs/2111.10106v1
- Date: Fri, 19 Nov 2021 09:07:14 GMT
- Title: A Large Scale Benchmark for Individual Treatment Effect Prediction and
Uplift Modeling
- Authors: Eustache Diemert, Artem Betlei, Christophe Renaudin, Massih-Reza
Amini, Th\'eophane Gregoir, Thibaud Rahier
- Abstract summary: Individual Treatment Effect (ITE) prediction aims at explaining and estimating the causal impact of an action at the granular level.
To foster research on this topic we release a publicly available collection of 13.9 million samples collected from several randomized control trials.
- Score: 7.1736440498963105
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Individual Treatment Effect (ITE) prediction is an important area of research
in machine learning which aims at explaining and estimating the causal impact
of an action at the granular level. It represents a problem of growing interest
in multiple sectors of application such as healthcare, online advertising or
socioeconomics. To foster research on this topic we release a publicly
available collection of 13.9 million samples collected from several randomized
control trials, scaling up previously available datasets by a healthy 210x
factor. We provide details on the data collection and perform sanity checks to
validate the use of this data for causal inference tasks. First, we formalize
the task of uplift modeling (UM) that can be performed with this data, along
with the relevant evaluation metrics. Then, we propose synthetic response
surfaces and heterogeneous treatment assignment providing a general set-up for
ITE prediction. Finally, we report experiments to validate key characteristics
of the dataset leveraging its size to evaluate and compare - with high
statistical significance - a selection of baseline UM and ITE prediction
methods.
Related papers
- Using representation balancing to learn conditional-average dose responses from clustered data [5.633848204699653]
Estimating a unit's responses to interventions with an associated dose is relevant in a variety of domains.
We show the impacts of clustered data on model performance and propose an estimator, CBRNet.
arXiv Detail & Related papers (2023-09-07T14:17:44Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect
Estimation [7.060064266376701]
Causal Inference has wide applications in various areas such as E-commerce and precision medicine.
This paper proposes Deep Entire Space Cross Networks (DESCN) to model treatment effects from an end-to-end perspective.
arXiv Detail & Related papers (2022-07-19T01:25:31Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Causal Markov Boundaries [0.0]
We show how we can use observational data to improve feature selection and effect estimation.
Our paper extends the notion of Markov boundary to treatment-outcome pairs.
arXiv Detail & Related papers (2021-03-12T22:49:10Z) - Double machine learning for sample selection models [0.12891210250935145]
This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition.
We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning-based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias.
arXiv Detail & Related papers (2020-11-30T19:40:21Z) - Statistical Analytics and Regional Representation Learning for COVID-19
Pandemic Understanding [4.731074162093199]
The rapid spread of the novel coronavirus (COVID-19) has severely impacted almost all countries around the world.
This paper combines and processes an extensive collection of publicly available datasets to provide a unified information source.
A specific RNN-based inference pipeline called DoubleWindowLSTM-CP is proposed in this work for predictive event modeling.
arXiv Detail & Related papers (2020-08-08T03:35:16Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.