On the Sample Complexity of Causal Discovery and the Value of Domain
Expertise
- URL: http://arxiv.org/abs/2102.03274v1
- Date: Fri, 5 Feb 2021 16:26:17 GMT
- Title: On the Sample Complexity of Causal Discovery and the Value of Domain
Expertise
- Authors: Samir Wadhwa, Roy Dong
- Abstract summary: Causal discovery methods seek to identify causal relations between random variables from purely observational data.
In this paper, we analyze the sample complexity of causal discovery algorithms without a CI oracle.
Our methods allow us to quantify the value of domain expertise in terms of data samples.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal discovery methods seek to identify causal relations between random
variables from purely observational data, as opposed to actively collected
experimental data where an experimenter intervenes on a subset of correlates.
One of the seminal works in this area is the Inferred Causation algorithm,
which guarantees successful causal discovery under the assumption of a
conditional independence (CI) oracle: an oracle that can states whether two
random variables are conditionally independent given another set of random
variables. Practical implementations of this algorithm incorporate statistical
tests for conditional independence, in place of a CI oracle. In this paper, we
analyze the sample complexity of causal discovery algorithms without a CI
oracle: given a certain level of confidence, how many data points are needed
for a causal discovery algorithm to identify a causal structure? Furthermore,
our methods allow us to quantify the value of domain expertise in terms of data
samples. Finally, we demonstrate the accuracy of these sample rates with
numerical examples, and quantify the benefits of sparsity priors and known
causal directions.
Related papers
- Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - A Versatile Causal Discovery Framework to Allow Causally-Related Hidden
Variables [28.51579090194802]
We introduce a novel framework for causal discovery that accommodates the presence of causally-related hidden variables almost everywhere in the causal network.
We develop a Rank-based Latent Causal Discovery algorithm, RLCD, that can efficiently locate hidden variables, determine their cardinalities, and discover the entire causal structure over both measured and hidden ones.
Experimental results on both synthetic and real-world personality data sets demonstrate the efficacy of the proposed approach in finite-sample cases.
arXiv Detail & Related papers (2023-12-18T07:57:39Z) - A Survey on Causal Discovery Methods for I.I.D. and Time Series Data [4.57769506869942]
Causal Discovery (CD) algorithms can identify the cause-effect relationships among the variables of a system from related observational data.
We present an extensive discussion on the methods designed to perform causal discovery from both independent and identically distributed (I.I.D.) data and time series data.
arXiv Detail & Related papers (2023-03-27T09:21:41Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Towards Dynamic Causal Discovery with Rare Events: A Nonparametric
Conditional Independence Test [4.67306371596399]
We introduce a novel statistical independence test on data collected from time-invariant systems in which rare but consequential events occur.
We provide non-asymptotic sample bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets.
arXiv Detail & Related papers (2022-11-29T21:15:51Z) - Valid Inference After Causal Discovery [73.87055989355737]
We develop tools for valid post-causal-discovery inference.
We show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates.
arXiv Detail & Related papers (2022-08-11T17:40:45Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - BaCaDI: Bayesian Causal Discovery with Unknown Interventions [118.93754590721173]
BaCaDI operates in the continuous space of latent probabilistic representations of both causal structures and interventions.
In experiments on synthetic causal discovery tasks and simulated gene-expression data, BaCaDI outperforms related methods in identifying causal structures and intervention targets.
arXiv Detail & Related papers (2022-06-03T16:25:48Z) - The interventional Bayesian Gaussian equivalent score for Bayesian
causal inference with unknown soft interventions [0.0]
In certain settings, such as genomics, we may have data from heterogeneous study conditions, with soft (partial) interventions only pertaining to a subset of the study variables.
We define the interventional BGe score for a mixture of observational and interventional data, where the targets and effects of intervention may be unknown.
arXiv Detail & Related papers (2022-05-05T12:32:08Z) - Combining Observational and Randomized Data for Estimating Heterogeneous
Treatment Effects [82.20189909620899]
Estimating heterogeneous treatment effects is an important problem across many domains.
Currently, most existing works rely exclusively on observational data.
We propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data.
arXiv Detail & Related papers (2022-02-25T18:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.