Related papers: The Challenges of Hyperparameter Tuning for Accurate Causal Effect Estimation

The Challenges of Hyperparameter Tuning for Accurate Causal Effect Estimation

URL: http://arxiv.org/abs/2303.01412v2
Date: Fri, 03 Oct 2025 17:33:14 GMT
Title: The Challenges of Hyperparameter Tuning for Accurate Causal Effect Estimation
Authors: Damian Machlanski, Spyridon Samothrakis, Paul Clarke,
Abstract summary: Many ML methods (causal estimators') have been proposed for causal inference.<n>For non-causal predictive tasks, there is a consensus on the choice of tuning metrics, making it simple to compare models.<n>For causal inference tasks, such a consensus is yet to be reached, making any comparison of causal models difficult.
Score: 2.43420394129881
License: http://creativecommons.org/licenses/by/4.0/
Abstract: ML is playing an increasingly crucial role in estimating causal effects of treatments on outcomes from observational data. Many ML methods (`causal estimators') have been proposed for this task. All of these methods, as with any ML approach, require extensive hyperparameter tuning. For non-causal predictive tasks, there is a consensus on the choice of tuning metrics (e.g. mean squared error), making it simple to compare models. However, for causal inference tasks, such a consensus is yet to be reached, making any comparison of causal models difficult. On top of that, there is no ideal metric on which to tune causal estimators, so one must rely on proxies. Furthermore, the fact that model selection in causal inference involves multiple components (causal estimator, ML regressor, hyperparameters, metric), complicates the issue even further. In order to evaluate the importance of each component, we perform an extensive empirical study on their combination. Our experimental setup involves many commonly used causal estimators, regressors (`base learners' henceforth) and metrics applied to four well-known causal inference benchmark datasets. Our results show that hyperparameter tuning increased the probability of reaching state-of-the-art performance in average ($65\% {\rightarrow} 81\%$) and individualised ($50\% {\rightarrow} 57\%$) effect estimation with only commonly used estimators. We also show that the performance of standard metrics can be inconsistent across different scenarios. Our findings highlight the need for further research to establish whether metrics uniformly capable of state-of-the-art performance in causal model evaluation can be found.

Related papers

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination [67.67725938962798]
Pre-training on massive web-scale corpora leaves Qwen2.5 susceptible to data contamination in widely used benchmarks.<n>We introduce a generator that creates fully clean arithmetic problems of arbitrary length and difficulty, dubbed RandomCalculation.<n>We show that only accurate reward signals yield steady improvements that surpass the base model's performance boundary.
arXiv Detail & Related papers (2025-07-14T17:55:15Z)
A Causal Inference Framework for Data Rich Environments [17.588417435132538]
We show how classic models for potential outcomes and treatment assignments fit within our framework.<n>For any estimator that has a fast enough estimation error rate for a certain nuisance parameter, we establish it is consistent for these various causal parameters.
arXiv Detail & Related papers (2025-04-02T13:04:26Z)
Black Box Causal Inference: Effect Estimation via Meta Prediction [56.277798874118425]
We frame causal inference as a dataset-level prediction problem, offloading algorithm design to the learning process. We introduce, called black box causal inference (BBCI), builds estimators in a black-box manner by learning to predict causal effects from sampled dataset-effect pairs. We demonstrate accurate estimation of average treatment effects (ATEs) and conditional average treatment effects (CATEs) with BBCI across several causal inference problems.
arXiv Detail & Related papers (2025-03-07T23:43:19Z)
Re-Visiting Explainable AI Evaluation Metrics to Identify The Most Informative Features [0.0]
Functionality or proxy-based approach is one of the used approaches to evaluate the quality of artificial intelligence methods. Among them, Selectivity or RemOve And Retrain (ROAR), and Permutation Importance (PI) are the most commonly used metrics. We propose expected accuracy interval (EAI) to predict the upper and lower bounds of the the accuracy of the model when ROAR or IP is implemented.
arXiv Detail & Related papers (2025-01-31T17:18:43Z)
Precise Model Benchmarking with Only a Few Observations [6.092112060364272]
We propose an empirical Bayes (EB) estimator that balances direct and regression estimates for each subgroup separately. EB consistently provides more precise estimates of the LLM performance compared to the direct and regression approaches.
arXiv Detail & Related papers (2024-10-07T17:26:31Z)
Estimating Causal Effects with Double Machine Learning -- A Method Evaluation [5.904095466127043]
We review one of the most prominent methods - "double/debiased machine learning" (DML) Our findings indicate that the application of a suitably flexible machine learning algorithm within DML improves the adjustment for various nonlinear confounding relationships. When estimating the effects of air pollution on housing prices, we find that DML estimates are consistently larger than estimates of less flexible methods.
arXiv Detail & Related papers (2024-03-21T13:21:33Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study [4.526082390949313]
We empirically assess the relationship between the predictive performance of machine learning methods and the resulting causal estimation. We conduct an extensive simulation study using data from the 2019 Atlantic Causal Inference Conference Data Challenge.
arXiv Detail & Related papers (2024-02-07T09:01:51Z)
Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals. Model-to-Match uses variable importance measurements to construct a distance metric. We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z)
Exploring validation metrics for offline model-based optimisation with diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z)
An evaluation framework for comparing causal inference models [3.1372269816123994]
We use the proposed evaluation methodology to compare several state-of-the-art causal effect estimation models. The main motivation behind this approach is the elimination of the influence of a small number of instances or simulation on the benchmarking process.
arXiv Detail & Related papers (2022-08-31T21:04:20Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Expected Validation Performance and Estimation of a Random Variable's Maximum [48.83713377993604]
We analyze three statistical estimators for expected validation performance. We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias. We find that the two biased estimators lead to the fewest incorrect conclusions.
arXiv Detail & Related papers (2021-10-01T18:48:47Z)
Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data [15.27393561231633]
We propose a doubly robust two-stage semiparametric difference-in-difference estimator for estimating heterogeneous treatment effects. The first stage allows a general set of machine learning methods to be used to estimate the propensity score. In the second stage, we derive the rates of convergence for both the parametric parameter and the unknown function.
arXiv Detail & Related papers (2020-09-07T15:14:29Z)
Nonparametric inverse probability weighted estimators based on the highly adaptive lasso [0.966840768820136]
Inparametric inverse probability weighted estimators are known to be inefficient and suffer from the curse of dimensionality. We propose a class of nonparametric inverse probability weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso. Our developments have broad implications for the construction of efficient inverse probability weighted estimators in large statistical models and a variety of problem settings.
arXiv Detail & Related papers (2020-05-22T17:49:46Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference. Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances. We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.