In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
- URL: http://arxiv.org/abs/2302.02923v2
- Date: Tue, 6 Jun 2023 09:17:12 GMT
- Title: In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
- Authors: Alicia Curth, Mihaela van der Schaar
- Abstract summary: This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
- Score: 92.51773744318119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized treatment effect estimates are often of interest in high-stakes
applications -- thus, before deploying a model estimating such effects in
practice, one needs to be sure that the best candidate from the ever-growing
machine learning toolbox for this task was chosen. Unfortunately, due to the
absence of counterfactual information in practice, it is usually not possible
to rely on standard validation metrics for doing so, leading to a well-known
model selection dilemma in the treatment effect estimation literature. While
some solutions have recently been investigated, systematic understanding of the
strengths and weaknesses of different model selection criteria is still
lacking. In this paper, instead of attempting to declare a global `winner', we
therefore empirically investigate success- and failure modes of different
selection criteria. We highlight that there is a complex interplay between
selection strategies, candidate estimators and the data used for comparing
them, and provide interesting insights into the relative (dis)advantages of
different criteria alongside desiderata for the design of further illuminating
empirical studies in this context.
Related papers
- Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes.
We find that the majority of disagreements are in opposition with standard reward modeling approaches.
We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z) - Globally-Optimal Greedy Experiment Selection for Active Sequential
Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data.
The goal is to design experiment selection rules for more accurate model estimation.
We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z) - Deep Neural Network Benchmarks for Selective Classification [27.098996474946446]
Multiple selective classification frameworks exist, most of which rely on deep neural network architectures.
We evaluate these approaches using several criteria, including selective error rate, empirical coverage, distribution of rejected instance's classes, and performance on out-of-distribution instances.
arXiv Detail & Related papers (2024-01-23T12:15:47Z) - Generalization within in silico screening [19.58677466616286]
In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation.
By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization.
We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch.
arXiv Detail & Related papers (2023-07-18T16:01:01Z) - Benchmarking common uncertainty estimation methods with
histopathological images under domain shift and label noise [62.997667081978825]
In high-risk environments, deep learning models need to be able to judge their uncertainty and reject inputs when there is a significant chance of misclassification.
We conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images.
We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise.
arXiv Detail & Related papers (2023-01-03T11:34:36Z) - Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation [24.65301562548798]
We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation.
We conduct an empirical analysis to benchmark the surrogate model selection metrics introduced in the literature, as well as the novel ones introduced in this work.
arXiv Detail & Related papers (2022-11-03T16:26:06Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Learning Overlapping Representations for the Estimation of
Individualized Treatment Effects [97.42686600929211]
Estimating the likely outcome of alternatives from observational data is a challenging problem.
We show that algorithms that learn domain-invariant representations of inputs are often inappropriate.
We develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
arXiv Detail & Related papers (2020-01-14T12:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.