Smoke and Mirrors in Causal Downstream Tasks
- URL: http://arxiv.org/abs/2405.17151v1
- Date: Mon, 27 May 2024 13:26:34 GMT
- Title: Smoke and Mirrors in Causal Downstream Tasks
- Authors: Riccardo Cadei, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, Francesco Locatello,
- Abstract summary: This paper looks at the causal inference task of treatment effect estimation.
We assume binary effects that are recorded as high-dimensional images in a Randomized Controlled Trial.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones.
We find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
- Score: 59.90654397037007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning and AI have the potential to transform data-driven scientific discovery, enabling accurate predictions for several scientific phenomena. As many scientific questions are inherently causal, this paper looks at the causal inference task of treatment effect estimation, where we assume binary effects that are recorded as high-dimensional images in a Randomized Controlled Trial (RCT). Despite being the simplest possible setting and a perfect fit for deep learning, we theoretically find that many common choices in the literature may lead to biased estimates. To test the practical impact of these considerations, we recorded the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones. Further, we highlight guidelines for representation learning methods to help answer causal questions in the sciences. All code and data will be released.
Related papers
- GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z) - Testing Causality in Scientific Modelling Software [0.26388783516590225]
Causal Testing Framework is a framework that uses Causal Inference techniques to establish causal effects from existing data.
We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes.
arXiv Detail & Related papers (2022-09-01T10:57:54Z) - Active Bayesian Causal Inference [72.70593653185078]
We propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning.
ABCI jointly infers a posterior over causal models and queries of interest.
We show that our approach is more data-efficient than several baselines that only focus on learning the full causal graph.
arXiv Detail & Related papers (2022-06-04T22:38:57Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Causal discovery for observational sciences using supervised machine
learning [1.6631602844999722]
Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models.
Severalally correct methods already exist, but they generally struggle on smaller samples.
Most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms.
We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco)
arXiv Detail & Related papers (2022-02-25T16:44:00Z) - Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction
Prediction and Synthesis Design [0.8594140167290099]
We identify three trends within the fields of chemical reaction prediction and synthesis design that require a change in direction.
First, the manner in which reaction datasets are split into reactants and reagents encourages testing models in an unrealistically generous manner.
Second, we highlight the prevalence of mislabelled data, and suggest that the focus should be on outlier removal rather than data fitting only.
arXiv Detail & Related papers (2021-05-06T13:11:56Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z) - Discovering Reliable Causal Rules [27.221938979891384]
We study the problem of deriving policies, or rules, that when enacted on a complex system, cause a desired outcome.
observational effects are often unrepresentative of the underlying causal effect because they are skewed by the presence of confounding factors.
We propose a conservative and consistent estimator of the causal effect, and derive an efficient and exact algorithm that maximises the estimator.
arXiv Detail & Related papers (2020-09-06T13:08:20Z) - Detect and Correct Bias in Multi-Site Neuroimaging Datasets [2.750124853532831]
We combine 35,320 magnetic resonance images of the brain from 17 studies to examine bias in neuroimaging.
We take a closer look at confounding bias, which is often viewed as the main shortcoming in observational studies.
We propose an extension of the recently introduced ComBat algorithm to control for global variation across image features.
arXiv Detail & Related papers (2020-02-12T15:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.