Recursive Causal Discovery
- URL: http://arxiv.org/abs/2403.09300v1
- Date: Thu, 14 Mar 2024 11:46:25 GMT
- Title: Recursive Causal Discovery
- Authors: Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari, Negar Kiyavash,
- Abstract summary: Causal discovery is often the first step toward the identification and estimation of causal effects.
This paper builds upon and extends four of our prior publications.
We present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation.
- Score: 22.56309301911757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and extends four of our prior publications (Mokhtarian et al., 2021; Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced the concept of removable variables, which are the only variables that can be removed recursively for the purpose of causal discovery. Presence and identification of removable variables allow recursive approaches for causal discovery, a promising solution that helps to address the aforementioned challenges by reducing the problem size successively. This reduction not only minimizes conditioning sets in each conditional independence (CI) test, leading to fewer errors but also significantly decreases the number of required CI tests. The worst-case performances of these methods nearly match the lower bound. In this paper, we present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation. A comprehensive literature review is also included, comparing the computational complexity of our methods with existing approaches, showcasing their state-of-the-art efficiency. Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com.
Related papers
- AcceleratedLiNGAM: Learning Causal DAGs at the speed of GPUs [57.12929098407975]
We show that by efficiently parallelizing existing causal discovery methods, we can scale them to thousands of dimensions.
Specifically, we focus on the causal ordering subprocedure in DirectLiNGAM and implement GPU kernels to accelerate it.
This allows us to apply DirectLiNGAM to causal inference on large-scale gene expression data with genetic interventions yielding competitive results.
arXiv Detail & Related papers (2024-03-06T15:06:11Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints [3.2374399328078285]
Graphical structures estimated by causal learning algorithms from time series data can provide misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data.
Existing algorithms provide limited resources to respond to this challenge, and so researchers must either use models that they know are likely misleading, or else forego causal learning entirely.
Existing methods face up-to-four distinct shortfalls, as they might 1) require that the difference between causal and measurement is known; 2) only handle very small number of random variables when the timescale difference is unknown; 3) only apply to pairs of variables; or 4) be unable to
arXiv Detail & Related papers (2022-05-18T22:38:57Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - Recursive Causal Structure Learning in the Presence of Latent Variables
and Selection Bias [27.06618125828978]
We consider the problem of learning the causal MAG of a system from observational data in the presence of latent variables and selection bias.
We propose a novel computationally efficient constraint-based method that is sound and complete.
We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world structures.
arXiv Detail & Related papers (2021-10-22T19:49:59Z) - Improving Efficiency and Accuracy of Causal Discovery Using a
Hierarchical Wrapper [7.570246812206772]
Causal discovery from observational data is an important tool in many branches of science.
In the large sample limit, sound and complete causal discovery algorithms have been previously introduced.
However, only finite training data is available, which limits the power of statistical tests used by these algorithms.
arXiv Detail & Related papers (2021-07-11T09:24:49Z) - DPER: Efficient Parameter Estimation for Randomly Missing Data [0.24466725954625884]
We propose novel algorithms to find the maximum likelihood estimates (MLEs) for a one-class/multiple-class randomly missing data set.
Our algorithms do not require multiple iterations through the data, thus promising to be less time-consuming than other methods.
arXiv Detail & Related papers (2021-06-06T16:37:48Z) - A Recursive Markov Boundary-Based Approach to Causal Structure Learning [22.38302412440357]
We propose a novel recursion-based method for causal structure learning.
It significantly reduces the required number of conditional independence tests.
Our experimental results show that the proposed algorithm outperforms state-of-the-art both on synthetic and real-world structures.
arXiv Detail & Related papers (2020-10-10T13:26:22Z) - Boosting Data Reduction for the Maximum Weight Independent Set Problem
Using Increasing Transformations [59.84561168501493]
We introduce new generalized data reduction and transformation rules for the maximum weight independent set problem.
Surprisingly, these so-called increasing transformations can simplify the problem and also open up the reduction space to yield even smaller irreducible graphs later in the algorithm.
Our algorithm computes significantly smaller irreducible graphs on all except one instance, solves more instances to optimality than previously possible, is up to two orders of magnitude faster than the best state-of-the-art solver, and finds higher-quality solutions than solvers DynWVC and HILS.
arXiv Detail & Related papers (2020-08-12T08:52:50Z) - Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures.
For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree.
As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.