Related papers: In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation

In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation

URL: http://arxiv.org/abs/2302.02923v2
Date: Tue, 6 Jun 2023 09:17:12 GMT
Title: In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
Authors: Alicia Curth, Mihaela van der Schaar
Abstract summary: This paper empirically investigates the strengths and weaknesses of different model selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
Score: 92.51773744318119
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized treatment effect estimates are often of interest in high-stakes applications -- thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the ever-growing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a well-known model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner', we therefore empirically investigate success- and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.

Related papers

Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Globally-Optimal Greedy Experiment Selection for Active Sequential Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data. The goal is to design experiment selection rules for more accurate model estimation. We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z)
Deep Neural Network Benchmarks for Selective Classification [27.098996474946446]
Multiple selective classification frameworks exist, most of which rely on deep neural network architectures. We evaluate these approaches using several criteria, including selective error rate, empirical coverage, distribution of rejected instance's classes, and performance on out-of-distribution instances.
arXiv Detail & Related papers (2024-01-23T12:15:47Z)
Generalization within in silico screening [19.58677466616286]
In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation. By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization. We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch.
arXiv Detail & Related papers (2023-07-18T16:01:01Z)
Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise [62.997667081978825]
In high-risk environments, deep learning models need to be able to judge their uncertainty and reject inputs when there is a significant chance of misclassification. We conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise.
arXiv Detail & Related papers (2023-01-03T11:34:36Z)
Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation [24.65301562548798]
We study the problem of model selection in causal inference, specifically for conditional average treatment effect (CATE) estimation. We conduct an empirical analysis to benchmark the surrogate model selection metrics introduced in the literature, as well as the novel ones introduced in this work.
arXiv Detail & Related papers (2022-11-03T16:26:06Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Learning Overlapping Representations for the Estimation of Individualized Treatment Effects [97.42686600929211]
Estimating the likely outcome of alternatives from observational data is a challenging problem. We show that algorithms that learn domain-invariant representations of inputs are often inappropriate. We develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
arXiv Detail & Related papers (2020-01-14T12:56:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.