Optimal community detection in dense bipartite graphs
- URL: http://arxiv.org/abs/2505.18372v1
- Date: Fri, 23 May 2025 20:58:55 GMT
- Title: Optimal community detection in dense bipartite graphs
- Authors: Julien Chhor, Parker Knight,
- Abstract summary: We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size $n_1 times n$.<n>We provide non-asymptotic upper and lower bounds on the smallest signal strength $delta*$ that is both necessary and sufficient to ensure the existence of a test with small enough type one and type two errors.<n>Our proposed tests involve a combination of hard-thresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size $n_1 \times n_2$. Under the null hypothesis, the observed graph is drawn from a bipartite Erd\H{o}s-Renyi distribution with connection probability $p_0$. Under the alternative hypothesis, there exists an unknown bipartite subgraph of size $k_1 \times k_2$ in which edges appear with probability $p_1 = p_0 + \delta$ for some $\delta > 0$, while all other edges outside the subgraph appear with probability $p_0$. Specifically, we provide non-asymptotic upper and lower bounds on the smallest signal strength $\delta^*$ that is both necessary and sufficient to ensure the existence of a test with small enough type one and type two errors. We also derive novel minimax-optimal tests achieving these fundamental limits when the underlying graph is sufficiently dense. Our proposed tests involve a combination of hard-thresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest. In contrast with previous work, our non-asymptotic upper and lower bounds match for any configuration of $n_1,n_2, k_1,k_2$.
Related papers
- Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - A computational transition for detecting correlated stochastic block models by low-degree polynomials [13.396246336911842]
We consider a pair of correlated block models $mathcalS(n,tfraclambdan;k,epsilon;s)$ that are subsampled from a common parent block model $mathcal S(n,tfraclambdan;k,epsilon)$.<n>For the detection problem of distinguishing this model from a pair of independent ErdHos'enyi graphs, we focus on tests based on emphlow-degrees of the entries of the adjacency
arXiv Detail & Related papers (2024-09-02T06:14:05Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem.
We consider detecting the presence of a planted $Gr(ngamma, n-alpha)$ subhypergraph in a $Gr(n, n-beta) hypergraph.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known.
arXiv Detail & Related papers (2023-04-17T10:38:08Z) - Planted Bipartite Graph Detection [13.95780443241133]
We consider the task of detecting a hidden bipartite subgraph in a given random graph.
Under the null hypothesis, the graph is a realization of an ErdHosR'enyi random graph over $n$ with edge density $q$.
Under the alternative, there exists a planted $k_mathsfR times k_mathsfL$ bipartite subgraph with edge density $p>q$.
arXiv Detail & Related papers (2023-02-07T18:18:17Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis.
We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$.
Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z) - Random Subgraph Detection Using Queries [29.192695995340653]
The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense.
In this paper, we consider a natural variant of the above problem, where one can only observe a relatively small part of the graph using adaptive edge queries.
For this model, we determine the number of queries necessary and sufficient (accompanied with a quasi-polynomial optimal algorithm) for detecting the presence of the planted subgraph.
arXiv Detail & Related papers (2021-10-02T07:41:17Z) - Correlation detection in trees for partial graph alignment [3.5880535198436156]
We consider alignment of graphs, which consists in finding a mapping between the nodes of two graphs which preserves most of the edges.
Our approach is to compare local structures in the two graphs, matching two nodes if their neighborhoods are 'close enough' for correlated graphs.
This problem can be rephrased in terms of testing whether a pair of branching trees is drawn from either a product distribution, or a correlated distribution.
arXiv Detail & Related papers (2021-07-15T22:02:27Z) - Testing correlation of unlabeled random graphs [18.08210501570919]
We study the problem of detecting the edge correlation between two random graphs with $n$ unlabeled nodes.
This is formalized as a hypothesis testing problem, where under the null hypothesis, the two graphs are independently generated.
Under the alternative, the two graphs are edge-correlated under some latent node correspondence, but have the same marginal distributions as the null.
arXiv Detail & Related papers (2020-08-23T19:19:45Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.