Causal Discovery for Cross-Sectional Data Based on Super-Structure and Divide-and-Conquer
- URL: http://arxiv.org/abs/2602.03914v1
- Date: Tue, 03 Feb 2026 16:18:17 GMT
- Title: Causal Discovery for Cross-Sectional Data Based on Super-Structure and Divide-and-Conquer
- Authors: Wenyu Wang, Yaping Wan,
- Abstract summary: We propose a novel framework that relaxes the strict requirements on Super-Structure construction while preserving the algorithmic benefits of divide-and-conquer.<n>We instantiate the framework in a concrete causal discovery algorithm and rigorously evaluate its components on synthetic data.<n>Our results establish that accurate, scalable causal discovery is achievable even under minimal assumptions about the initial Super-Structure.
- Score: 9.740161937852067
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper tackles a critical bottleneck in Super-Structure-based divide-and-conquer causal discovery: the high computational cost of constructing accurate Super-Structures--particularly when conditional independence (CI) tests are expensive and domain knowledge is unavailable. We propose a novel, lightweight framework that relaxes the strict requirements on Super-Structure construction while preserving the algorithmic benefits of divide-and-conquer. By integrating weakly constrained Super-Structures with efficient graph partitioning and merging strategies, our approach substantially lowers CI test overhead without sacrificing accuracy. We instantiate the framework in a concrete causal discovery algorithm and rigorously evaluate its components on synthetic data. Comprehensive experiments on Gaussian Bayesian networks, including magic-NIAB, ECOLI70, and magic-IRRI, demonstrate that our method matches or closely approximates the structural accuracy of PC and FCI while drastically reducing the number of CI tests. Further validation on the real-world China Health and Retirement Longitudinal Study (CHARLS) dataset confirms its practical applicability. Our results establish that accurate, scalable causal discovery is achievable even under minimal assumptions about the initial Super-Structure, opening new avenues for applying divide-and-conquer methods to large-scale, knowledge-scarce domains such as biomedical and social science research.
Related papers
- Fast Flow Matching based Conditional Independence Tests for Causal Discovery [19.33167245211968]
Constraint-based causal discovery methods require a large number of conditional independence (CI) tests.<n>We propose the Flow Matching-based Conditional Independence Test (FMCIT)<n>The proposed test leverages the high computational efficiency of flow matching and requires the model to be trained only once throughout the entire causal discovery procedure.
arXiv Detail & Related papers (2026-02-09T06:43:23Z) - Opportunities in AI/ML for the Rubin LSST Dark Energy Science Collaboration [63.61423859450929]
This white paper surveys the current landscape of AI/ML across DESC's primary cosmological probes and cross-cutting analyses.<n>We identify key methodological research priorities, including Bayesian inference at scale, physics-informed methods, validation frameworks, and active learning for discovery.
arXiv Detail & Related papers (2026-01-20T18:46:42Z) - Efficient Differentiable Causal Discovery via Reliable Super-Structure Learning [51.20606796019663]
We propose ALVGL, a novel and general enhancement to the differentiable causal discovery pipeline.<n>ALVGL employs a sparse and low-rank decomposition to learn the precision matrix of the data.<n>We show that ALVGL not only achieves state-of-the-art accuracy but also significantly improves optimization efficiency.
arXiv Detail & Related papers (2026-01-09T02:18:59Z) - Scalable Bayesian Network Structure Learning Using Tsetlin Machine to Constrain the Search Space [10.753354249346073]
The PC algorithm is a widely used method in causal inference for learning the structure of Bayesian networks.<n>Despite its popularity, the PC algorithm suffers from significant time complexity, particularly as the size of the dataset increases.<n>We propose a novel approach that utilise the Tsetlin Machine (TM) to construct Bayesian structures more efficiently.
arXiv Detail & Related papers (2025-11-24T16:23:19Z) - Efficient Ensemble Conditional Independence Test Framework for Causal Discovery [46.328102756312724]
We introduce the Ensemble Conditional Independence Test (E-CIT), a general and plug-and-play framework.<n>E-CIT partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values.<n>Results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance.
arXiv Detail & Related papers (2025-09-25T11:31:16Z) - TopoFR: A Closer Look at Topology Alignment on Face Recognition [58.45515807380505]
We propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE.<n> PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model.<n> Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T14:58:30Z) - Coordinated Multi-Neighborhood Learning on a Directed Acyclic Graph [6.727984016678534]
Learning the structure of causal directed acyclic graphs (DAGs) is useful in many areas of machine learning and artificial intelligence.
It is challenging to obtain good empirical and theoretical results without strong and often restrictive assumptions.
This paper develops a new constraint-based method for estimating the local structure around multiple user-specified target nodes.
arXiv Detail & Related papers (2024-05-24T08:49:43Z) - Discovering and Reasoning of Causality in the Hidden World with Large Language Models [109.62442253177376]
We develop a new framework termed Causal representatiOn AssistanT (COAT) to propose useful measured variables for causal discovery.<n>Instead of directly inferring causality with Large language models (LLMs), COAT constructs feedback from intermediate causal discovery results to LLMs to refine the proposed variables.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation
and A Strong Structure-Hardness-Aware Baseline [45.91245228386502]
Existing SAT generation approaches can hardly simultaneously capture the global structural properties and maintain plausible computational hardness.
We propose HardSATGEN, which introduces a fine-grained control mechanism to the neural split-merge paradigm for SAT formula generation.
Compared to the best previous methods, the average performance gains achieve 38.5% in structural statistics, 88.4% in computational metrics, and over 140.7% in the effectiveness of guiding solver tuning.
arXiv Detail & Related papers (2023-02-04T05:58:17Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Accelerating Recursive Partition-Based Causal Structure Learning [4.357523892518871]
Recursive causal discovery algorithms provide good results by using Conditional Independent (CI) tests in smaller sub-problems.
This paper proposes a generic causal structure refinement strategy that can locate the undesired relations with a small number of CI-tests.
We then empirically evaluate its performance against the state-of-the-art algorithms in terms of solution quality and completion time in synthetic and real datasets.
arXiv Detail & Related papers (2021-02-23T08:28:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.