Related papers: QWO: Speeding Up Permutation-Based Causal Discovery in LiGAMs

QWO: Speeding Up Permutation-Based Causal Discovery in LiGAMs

URL: http://arxiv.org/abs/2410.23155v1
Date: Wed, 30 Oct 2024 16:10:46 GMT
Title: QWO: Speeding Up Permutation-Based Causal Discovery in LiGAMs
Authors: Mohammad Shahverdikondori, Ehsan Mokhtarian, Negar Kiyavash,
Abstract summary: We introduce QWO, a novel approach that significantly enhances the efficiency of computing $mathcalGpi$ for a given permutation $pi$. QWO has a speed-up of $O(n2)$ ($n$ is the number of variables) compared to the state-of-the-art BIC-based method, making it highly scalable.
Score: 20.661343069864888
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Causal discovery is essential for understanding relationships among variables of interest in many scientific domains. In this paper, we focus on permutation-based methods for learning causal graphs in Linear Gaussian Acyclic Models (LiGAMs), where the permutation encodes a causal ordering of the variables. Existing methods in this setting are not scalable due to their high computational complexity. These methods are comprised of two main components: (i) constructing a specific DAG, $\mathcal{G}^\pi$, for a given permutation $\pi$, which represents the best structure that can be learned from the available data while adhering to $\pi$, and (ii) searching over the space of permutations (i.e., causal orders) to minimize the number of edges in $\mathcal{G}^\pi$. We introduce QWO, a novel approach that significantly enhances the efficiency of computing $\mathcal{G}^\pi$ for a given permutation $\pi$. QWO has a speed-up of $O(n^2)$ ($n$ is the number of variables) compared to the state-of-the-art BIC-based method, making it highly scalable. We show that our method is theoretically sound and can be integrated into existing search strategies such as GRASP and hill-climbing-based methods to improve their performance.

Related papers

Learning Compositional Functions with Transformers from Easy-to-Hard Data [63.96562216704653]
We study the learnability of the $k$-fold composition task, which requires computing an interleaved composition of $k$ input permutations and $k$ hidden permutations.<n>We show that this function class can be efficiently learned, with runtime and sample in $k$, by gradient descent on an $O(log k)$-depth transformer.
arXiv Detail & Related papers (2025-05-29T17:22:00Z)
Ehrenfeucht-Haussler Rank and Chain of Thought [51.33559894954108]
We show that the rank of a function $f$ corresponds to the minimum number of Chain of Thought steps required by a single-layer transformer decoder. We also analyze the problem of identifying the position of the $k$-th occurrence of 1 in a Boolean sequence, proving that it requires $k$ CoT steps.
arXiv Detail & Related papers (2025-01-22T16:30:58Z)
A $Δ$-evaluation function for column permutation problems [0.0]
A new $Delta$-evaluation method is introduced for solving a column permutation problem defined on a sparse binary matrix with the consecutive ones property. This problem models various $mathcalNP$-hard problems in graph theory and industrial manufacturing contexts. The proposed evaluation method is generally competitive and particularly useful for large and dense instances.
arXiv Detail & Related papers (2024-09-07T22:50:25Z)
Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
A Novel and Optimal Spectral Method for Permutation Synchronization [8.517406772939292]
Permutation synchronization is an important problem in computer science that constitutes the key step of many computer vision tasks. In this paper, we propose a novel and statistically optimal spectral algorithm.
arXiv Detail & Related papers (2023-03-21T17:43:26Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z)
Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$. We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM) Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity. We proposed a method to solve problems master to agents' communication and local calls. The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z)
Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs [45.87981728307819]
The ability to compare and align related datasets living in heterogeneous spaces plays an increasingly important role in machine learning. The Gromov-Wasserstein (GW) formalism can help tackle this problem.
arXiv Detail & Related papers (2021-06-02T12:50:56Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Efficient Permutation Discovery in Causal DAGs [9.22466799504763]
We introduce an efficient algorithm for finding sparse permutations of a directed acyclic graph. We show that our method with depth $w$ runs in $O(pw+3)$ time. We also compare our algorithm to provably consistent causal structure learning algorithms, such as the PC algorithm, GES, and GSP, and show that our method achieves comparable performance with a shorter runtime.
arXiv Detail & Related papers (2020-11-06T21:56:41Z)
LazyIter: A Fast Algorithm for Counting Markov Equivalent DAGs and Designing Experiments [30.592069659778716]
The size of an MEC is a measure of complexity for recovering the true causal graph by performing interventions. We propose a method for efficient iteration over possible MECs given intervention results. Our proposed algorithms for both computing the size of MEC and experiment design outperform the state of the art.
arXiv Detail & Related papers (2020-06-17T05:51:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.