Statistical Performance Guarantee for Subgroup Identification with
Generic Machine Learning
- URL: http://arxiv.org/abs/2310.07973v2
- Date: Wed, 20 Dec 2023 13:16:35 GMT
- Title: Statistical Performance Guarantee for Subgroup Identification with
Generic Machine Learning
- Authors: Michael Lingzhi Li, Kosuke Imai
- Abstract summary: We develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES)
We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders.
- Score: 1.0878040851638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Across a wide array of disciplines, many researchers use machine learning
(ML) algorithms to identify a subgroup of individuals who are likely to benefit
from a treatment the most (``exceptional responders'') or those who are harmed
by it. A common approach to this subgroup identification problem consists of
two steps. First, researchers estimate the conditional average treatment effect
(CATE) using an ML algorithm. Next, they use the estimated CATE to select those
individuals who are predicted to be most affected by the treatment, either
positively or negatively. Unfortunately, CATE estimates are often biased and
noisy. In addition, utilizing the same data to both identify a subgroup and
estimate its group average treatment effect results in a multiple testing
problem. To address these challenges, we develop uniform confidence bands for
estimation of the group average treatment effect sorted by generic ML algorithm
(GATES). Using these uniform confidence bands, researchers can identify, with a
statistical guarantee, a subgroup whose GATES exceeds a certain effect size,
regardless of how this effect size is chosen. The validity of the proposed
methodology depends solely on randomization of treatment and random sampling of
units. Importantly, our method does not require modeling assumptions and avoids
a computationally intensive resampling procedure. A simulation study shows that
the proposed uniform confidence bands are reasonably informative and have an
appropriate empirical coverage even when the sample size is as small as 100. We
analyze a clinical trial of late-stage prostate cancer and find a relatively
large proportion of exceptional responders.
Related papers
- Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts [12.289361708127876]
We use methodology for learning multi-accurate predictors to post-process CATE T-learners.
We show how this approach can combine (large) confounded observational and (smaller) randomized datasets.
arXiv Detail & Related papers (2024-05-28T14:12:25Z) - Causal K-Means Clustering [5.087519744951637]
Causal k-Means Clustering harnesses the widely-used k-means clustering algorithm to uncover the unknown subgroup structure.
We present a plug-in estimator which is simple and readily implementable using off-the-shelf algorithms.
Our proposed methods are especially useful for modern outcome-wide studies with multiple treatment levels.
arXiv Detail & Related papers (2024-05-05T23:59:51Z) - Sample Constrained Treatment Effect Estimation [28.156207324508706]
We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals.
In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s ll n$ individuals from the population to experiment on.
arXiv Detail & Related papers (2022-10-12T21:13:47Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - Robust and Agnostic Learning of Conditional Distributional Treatment
Effects [62.44901952244514]
The conditional average treatment effect (CATE) is the best point prediction of individual causal effects.
In aggregate analyses, this is usually addressed by measuring distributional treatment effect (DTE)
We provide a new robust and model-agnostic methodology for learning the conditional DTE (CDTE) for a wide class of problems.
arXiv Detail & Related papers (2022-05-23T17:40:31Z) - Statistical Inference for Heterogeneous Treatment Effects Discovered by Generic Machine Learning in Randomized Experiments [0.9208007322096533]
We develop a general approach to statistical inference for heterogeneous treatment effects discovered by a generic ML algorithm.
We show how to estimate the average treatment effect within each of these groups, and construct a valid confidence interval.
arXiv Detail & Related papers (2022-03-28T05:43:46Z) - Treatment Effect Risk: Bounds and Inference [58.442274475425144]
Since the average treatment effect measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population.
In this paper we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE distribution.
Some bounds can also be interpreted as summarizing a complex CATE function into a single metric and are of interest independently of being a bound.
arXiv Detail & Related papers (2022-01-15T17:21:26Z) - Robust Recursive Partitioning for Heterogeneous Treatment Effects with
Uncertainty Quantification [84.53697297858146]
Subgroup analysis of treatment effects plays an important role in applications from medicine to public policy to recommender systems.
Most of the current methods of subgroup analysis begin with a particular algorithm for estimating individualized treatment effects (ITE)
This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses.
arXiv Detail & Related papers (2020-06-14T14:50:02Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.