Related papers: Efficient Ensemble Conditional Independence Test Framework for Causal Discovery

Efficient Ensemble Conditional Independence Test Framework for Causal Discovery

URL: http://arxiv.org/abs/2509.21021v1
Date: Thu, 25 Sep 2025 11:31:16 GMT
Title: Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
Authors: Zhengkang Guan, Kun Kuang,
Abstract summary: We introduce the Ensemble Conditional Independence Test (E-CIT), a general and plug-and-play framework.<n>E-CIT partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values.<n>Results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance.
Score: 46.328102756312724
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Constraint-based causal discovery relies on numerous conditional independence tests (CITs), but its practical applicability is severely constrained by the prohibitive computational cost, especially as CITs themselves have high time complexity with respect to the sample size. To address this key bottleneck, we introduce the Ensemble Conditional Independence Test (E-CIT), a general and plug-and-play framework. E-CIT operates on an intuitive divide-and-aggregate strategy: it partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values using a novel method grounded in the properties of stable distributions. This framework reduces the computational complexity of a base CIT to linear in the sample size when the subset size is fixed. Moreover, our tailored p-value combination method offers theoretical consistency guarantees under mild conditions on the subtests. Experimental results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance. Notably, it exhibits an improvement in complex testing scenarios, particularly on real-world datasets.

Related papers

Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding [0.0]
fedCI is a conditional independence test that handles heterogeneous datasets.<n> fedCI-IOD enables causal discovery under latent confounding across distributed and heterogeneous datasets.<n>Our tools are publicly available as the fedCI Python package, a privacy-preserving R implementation of IOD, and a web application for the fedCI-IOD pipeline.
arXiv Detail & Related papers (2026-03-05T13:17:31Z)
A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z)
A Fast Kernel-based Conditional Independence test with Application to Causal Discovery [9.416064439922001]
FastKCI is a scalable and parallelizable kernel-based conditional independence test.<n>Experiments on synthetic datasets and benchmarks on real-world production data validate that FastKCI maintains the statistical power of the original KCI test.
arXiv Detail & Related papers (2025-05-16T10:14:57Z)
Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.<n>Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
Amortized Conditional Independence Testing [6.954510776782872]
ACID is a transformer-based neural network architecture that learns to test for conditional independence.<n>It consistently achieves state-of-the-art performance against existing baselines under multiple metrics.<n>It is able to generalize robustly to unseen sample sizes, dimensionalities, as well as non-linearities with a remarkably low inference time.
arXiv Detail & Related papers (2025-02-28T10:29:56Z)
Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques. In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets. Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z)
On sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery [21.12645737093305]
conditional independence testing is an essential step in constraint-based causal discovery algorithms. We design a test for conditional independence based on our estimator, called VM-CI, which achieves optimal parametric rates. We empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity.
arXiv Detail & Related papers (2023-10-20T14:52:25Z)
Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration. We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z)
Conditional Independence Testing via Latent Representation Learning [2.566492438263125]
LCIT (Latent representation based Conditional Independence Test) is a novel non-parametric method for conditional independence testing based on representation learning. Our main contribution involves proposing a generative framework in which to test for the independence between X and Y given Z.
arXiv Detail & Related papers (2022-09-04T07:16:03Z)
Private Quantiles Estimation in the Presence of Atoms [7.5072219939358105]
We address the differentially private estimation of multiple quantiles of a dataset. Non-smoothed JointExp suffers from an important lack of performance in the case of peaked distributions. We propose a simple and numerically efficient method called Heuristically Smoothed JointExp.
arXiv Detail & Related papers (2022-02-15T09:44:14Z)
Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z)
Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA. The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering. In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.