Empirical Bayesian Approaches for Robust Constraint-based Causal
Discovery under Insufficient Data
- URL: http://arxiv.org/abs/2206.08448v1
- Date: Thu, 16 Jun 2022 21:08:49 GMT
- Title: Empirical Bayesian Approaches for Robust Constraint-based Causal
Discovery under Insufficient Data
- Authors: Zijun Cui, Naiyu Yin, Yuru Wang, and Qiang Ji
- Abstract summary: Causal discovery methods assume data sufficiency, which may not be the case in many real world datasets.
We propose Bayesian-augmented frequentist independence tests to improve the performance of constraint-based causal discovery methods under insufficient data.
Experiments show significant performance improvement in terms of both accuracy and efficiency over SOTA methods.
- Score: 38.883810061897094
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Causal discovery is to learn cause-effect relationships among variables given
observational data and is important for many applications. Existing causal
discovery methods assume data sufficiency, which may not be the case in many
real world datasets. As a result, many existing causal discovery methods can
fail under limited data. In this work, we propose Bayesian-augmented
frequentist independence tests to improve the performance of constraint-based
causal discovery methods under insufficient data: 1) We firstly introduce a
Bayesian method to estimate mutual information (MI), based on which we propose
a robust MI based independence test; 2) Secondly, we consider the Bayesian
estimation of hypothesis likelihood and incorporate it into a well-defined
statistical test, resulting in a robust statistical testing based independence
test. We apply proposed independence tests to constraint-based causal discovery
methods and evaluate the performance on benchmark datasets with insufficient
samples. Experiments show significant performance improvement in terms of both
accuracy and efficiency over SOTA methods.
Related papers
- Uncertainty for Active Learning on Graphs [70.44714133412592]
Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models.
We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies.
We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries.
arXiv Detail & Related papers (2024-05-02T16:50:47Z) - Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - Differentially private Bayesian tests [1.3127313002783776]
We present a novel differentially private Bayesian hypotheses testing framework that arise naturally under a principled data generative mechanism.
By focusing on differentially private Bayes factors based on widely used test statistics, we circumvent the need to model the complete data generative mechanism.
arXiv Detail & Related papers (2024-01-27T21:07:11Z) - Assumption violations in causal discovery and the robustness of score matching [38.60630271550033]
This paper extensively benchmarks the empirical performance of recent causal discovery methods on observational i.i.d. data.
We show that score matching-based methods demonstrate surprising performance in the false positive and false negative rate of the inferred graph.
We hope this paper will set a new standard for the evaluation of causal discovery methods.
arXiv Detail & Related papers (2023-10-20T09:56:07Z) - Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment
Effect Estimation [137.3520153445413]
A notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference.
We evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets.
The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes.
arXiv Detail & Related papers (2023-07-11T02:58:10Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Null Hypothesis Test for Anomaly Detection [0.0]
We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis.
By testing for statistical independence of the two discriminating dataset regions, we are able exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions.
arXiv Detail & Related papers (2022-10-05T13:03:55Z) - Valid Inference After Causal Discovery [73.87055989355737]
We develop tools for valid post-causal-discovery inference.
We show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates.
arXiv Detail & Related papers (2022-08-11T17:40:45Z) - Evaluating Causal Inference Methods [0.4588028371034407]
We introduce a deep generative model-based framework, Credence, to validate causal inference methods.
Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods.
arXiv Detail & Related papers (2022-02-09T00:21:22Z) - On the Sample Complexity of Causal Discovery and the Value of Domain
Expertise [0.0]
Causal discovery methods seek to identify causal relations between random variables from purely observational data.
In this paper, we analyze the sample complexity of causal discovery algorithms without a CI oracle.
Our methods allow us to quantify the value of domain expertise in terms of data samples.
arXiv Detail & Related papers (2021-02-05T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.