Causal discovery for observational sciences using supervised machine
learning
- URL: http://arxiv.org/abs/2202.12813v1
- Date: Fri, 25 Feb 2022 16:44:00 GMT
- Title: Causal discovery for observational sciences using supervised machine
learning
- Authors: Anne Helby Petersen, Joseph Ramsey, Claus Thorn Ekstr{\o}m and Peter
Spirtes
- Abstract summary: Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models.
Severalally correct methods already exist, but they generally struggle on smaller samples.
Most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms.
We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco)
- Score: 1.6631602844999722
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Causal inference can estimate causal effects, but unless data are collected
experimentally, statistical analyses must rely on pre-specified causal models.
Causal discovery algorithms are empirical methods for constructing such causal
models from data.
Several asymptotically correct methods already exist, but they generally
struggle on smaller samples. Moreover, most methods focus on very sparse causal
models, which may not always be a realistic representation of real-life data
generating mechanisms. Finally, while causal relationships suggested by the
methods often hold true, their claims about causal non-relatedness have high
error rates. This non-conservative error tradeoff is not ideal for
observational sciences, where the resulting model is directly used to inform
causal inference: A causal model with many missing causal relations entails too
strong assumptions and may lead to biased effect estimates.
We propose a new causal discovery method that addresses these three
shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised
machine learning to obtain a mapping from observational data to equivalence
classes of causal models.
We evaluate SLdisco in a large simulation study based on Gaussian data and we
consider several choices of model size and sample size. We find that SLdisco is
more conservative, only moderately less informative and less sensitive towards
sample size than existing procedures.
We furthermore provide a real epidemiological data application. We use random
subsampling to investigate real data performance on small samples and again
find that SLdisco is less sensitive towards sample size and hence seems to
better utilize the information available in small datasets.
Related papers
- Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment.
First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population.
We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series [4.008958683836471]
CAnDOIT is a causal discovery method to reconstruct causal models using both observational and interventional data.
The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics.
A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub.
arXiv Detail & Related papers (2024-10-03T13:57:08Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables.
Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets.
Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z) - Discovering Mixtures of Structural Causal Models from Time Series Data [23.18511951330646]
We propose a general variational inference-based framework called MCD to infer the underlying causal models.
Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood.
We demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks.
arXiv Detail & Related papers (2023-10-10T05:13:10Z) - Active Bayesian Causal Inference [72.70593653185078]
We propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning.
ABCI jointly infers a posterior over causal models and queries of interest.
We show that our approach is more data-efficient than several baselines that only focus on learning the full causal graph.
arXiv Detail & Related papers (2022-06-04T22:38:57Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z) - Improving Efficiency and Accuracy of Causal Discovery Using a
Hierarchical Wrapper [7.570246812206772]
Causal discovery from observational data is an important tool in many branches of science.
In the large sample limit, sound and complete causal discovery algorithms have been previously introduced.
However, only finite training data is available, which limits the power of statistical tests used by these algorithms.
arXiv Detail & Related papers (2021-07-11T09:24:49Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.