On the Unlikelihood of D-Separation
- URL: http://arxiv.org/abs/2303.05628v2
- Date: Tue, 3 Oct 2023 14:32:11 GMT
- Title: On the Unlikelihood of D-Separation
- Authors: Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles,
Weiran Yao, Caiming Xiong, Devansh Arpit
- Abstract summary: We provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist.
For the PC Algorithm, while it is known that its worst-case guarantees fail on non-sparse graphs, we show that the same is true for the average case.
For UniformSGS, while it is known that the running time is exponential for existing edges, we show that in the average case, that is the expected running time for most non-existing edges as well.
- Score: 69.62839677485087
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Causal discovery aims to recover a causal graph from data generated by it;
constraint based methods do so by searching for a d-separating conditioning set
of nodes in the graph via an oracle. In this paper, we provide analytic
evidence that on large graphs, d-separation is a rare phenomenon, even when
guaranteed to exist, unless the graph is extremely sparse. We then provide an
analytic average case analysis of the PC Algorithm for causal discovery, as
well as a variant of the SGS Algorithm we call UniformSGS. We consider a set
$V=\{v_1,\ldots,v_n\}$ of nodes, and generate a random DAG $G=(V,E)$ where
$(v_a, v_b) \in E$ with i.i.d. probability $p_1$ if $a<b$ and $0$ if $a > b$.
We provide upper bounds on the probability that a subset of $V-\{x,y\}$
d-separates $x$ and $y$, conditional on $x$ and $y$ being d-separable; our
upper bounds decay exponentially fast to $0$ as $|V| \rightarrow \infty$. For
the PC Algorithm, while it is known that its worst-case guarantees fail on
non-sparse graphs, we show that the same is true for the average case, and that
the sparsity requirement is quite demanding: for good performance, the density
must go to $0$ as $|V| \rightarrow \infty$ even in the average case. For
UniformSGS, while it is known that the running time is exponential for existing
edges, we show that in the average case, that is the expected running time for
most non-existing edges as well.
Related papers
- Improved convergence rate of kNN graph Laplacians [11.93971616098517]
General class of $k$NN graph where the graph affinity is $W_ij = epsilon-d/2 .
We prove the point-wise convergence of the $k$NN graph Laplacian to the limiting manifold operator.
arXiv Detail & Related papers (2024-10-30T17:01:00Z) - Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem.
We consider detecting the presence of a planted $Gr(ngamma, n-alpha)$ subhypergraph in a $Gr(n, n-beta) hypergraph.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known.
arXiv Detail & Related papers (2023-04-17T10:38:08Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - Repeated Averages on Graphs [2.363388546004777]
We prove that $frac(1-epsilon)2log2nlog n-O(n)$ is a general lower bound for all connected graphs on $n$ nodes.
We also get sharp magnitude of $t_epsilon,1$ for several important families of graphs, including star, expander, dumbbell, and cycle.
arXiv Detail & Related papers (2022-05-09T20:18:31Z) - Inferring Hidden Structures in Random Graphs [13.031167737538881]
We study the two inference problems of detecting and recovering an isolated community of emphgeneral structure planted in a random graph.
We derive lower bounds for detecting/recovering the structure $Gamma_k$ in terms of the parameters $(n,k,q)$, as well as certain properties of $Gamma_k$, and exhibit computationally optimal algorithms that achieve these lower bounds.
arXiv Detail & Related papers (2021-10-05T09:39:51Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Faster DBSCAN via subsampled similarity queries [42.93847281082316]
DBSCAN is a popular density-based clustering algorithm.
We propose SNG-DBSCAN, which clusters based on a subsampled $epsilon$-neighborhood graph.
arXiv Detail & Related papers (2020-06-11T18:57:54Z) - Learning and Sampling of Atomic Interventions from Observations [11.522442415989818]
We study the problem of efficiently estimating the effect of an intervention on a single variable (atomic interventions) using observational samples in a causal Bayesian network.
Our goal is to give algorithms that are efficient in both time and sample complexity in a non-parametric setting.
arXiv Detail & Related papers (2020-02-11T07:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.