Is it easier to count communities than find them?
- URL: http://arxiv.org/abs/2212.10872v1
- Date: Wed, 21 Dec 2022 09:35:19 GMT
- Title: Is it easier to count communities than find them?
- Authors: Cynthia Rush, Fiona Skerman, Alexander S. Wein and Dana Yang
- Abstract summary: We consider certain hypothesis testing problems between models with different community structures.
We show that testing between two options is as hard as finding the communities.
- Score: 82.90505487525533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random graph models with community structure have been studied extensively in
the literature. For both the problems of detecting and recovering community
structure, an interesting landscape of statistical and computational phase
transitions has emerged. A natural unanswered question is: might it be possible
to infer properties of the community structure (for instance, the number and
sizes of communities) even in situations where actually finding those
communities is believed to be computationally hard? We show the answer is no.
In particular, we consider certain hypothesis testing problems between models
with different community structures, and we show (in the low-degree polynomial
framework) that testing between two options is as hard as finding the
communities.
In addition, our methods give the first computational lower bounds for
testing between two different `planted' distributions, whereas previous results
have considered testing between a planted distribution and an i.i.d. `null'
distribution.
Related papers
- Identifying General Mechanism Shifts in Linear Causal Representations [58.6238439611389]
We consider the linear causal representation learning setting where we observe a linear mixing of $d$ unknown latent factors.
Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them.
We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes.
arXiv Detail & Related papers (2024-10-31T15:56:50Z) - On an application of graph neural networks in population based SHM [0.0]
The aim of this paper is to predict the first natural frequency of trusses of different sizes, across different environmental temperatures and having different bar member types.
The accuracy of the regression is satisfactory even in structures with higher number of nodes and members than those used to train it.
arXiv Detail & Related papers (2022-03-03T11:11:57Z) - Sequential Community Mode Estimation [7.693649834261879]
We study the problem of identifying the largest community within the population via sequential, random sampling of individuals.
We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm.
arXiv Detail & Related papers (2021-11-16T15:05:40Z) - Average-Case Communication Complexity of Statistical Problems [48.03761288397421]
Communication complexity is the main tool for proving lower bounds in streaming, sketching, and query-based models.
We provide a general reduction method that preserves the input distribution for problems involving a random graph or matrix with planted structure.
We derive two-party and multi-party communication lower bounds for detecting or finding planted cliques.
arXiv Detail & Related papers (2021-07-03T03:31:37Z) - Group Testing with a Graph Infection Spread Model [61.48558770435175]
Infection spreads via connections between individuals and this results in a probabilistic cluster formation structure as well as a non-i.i.d. infection status for individuals.
We propose a class of two-step sampled group testing algorithms where we exploit the known probabilistic infection spread model.
Our results imply that, by exploiting information on the connections of individuals, group testing can be used to reduce the number of required tests significantly even when infection rate is high.
arXiv Detail & Related papers (2021-01-14T18:51:32Z) - Adjusted chi-square test for degree-corrected block models [13.122543280692641]
We propose a goodness-of-fit test for degree-corrected block models (DCSBM)
We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $d_i$ grows to infinity.
Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution.
arXiv Detail & Related papers (2020-12-30T05:20:59Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Revealing consensus and dissensus between network partitions [0.0]
Community detection methods attempt to divide a network into groups of nodes that share similar properties, thus revealing its large-scale structure.
Many methods exist to establish a consensus among them in the form of a single partition "point estimate" that summarizes the whole distribution.
Here we show that it is in general not possible to obtain a consistent answer from such point estimates when the underlying distribution is too heterogeneous.
We provide a comprehensive set of methods designed to characterize and summarize complex populations of partitions in a manner that captures not only the existing consensus, but also the dissensus between elements of the population.
arXiv Detail & Related papers (2020-05-28T13:29:42Z) - Provable Overlapping Community Detection in Weighted Graphs [3.04585143845864]
We provide a provable method to detect overlapping communities in weighted graphs without explicitly making the pure nodes assumption.
Our approach is based on convex optimization, for which many useful theoretical properties are already known.
We demonstrate the success of our algorithm on artificial and real-world datasets.
arXiv Detail & Related papers (2020-04-15T15:25:46Z) - Certified Robustness of Community Detection against Adversarial
Structural Perturbation via Randomized Smoothing [81.71105567425275]
We develop the first certified robustness guarantee of community detection against adversarial structural perturbation.
We theoretically show that the smoothed community detection method provably groups a given arbitrary set of nodes into the same community.
We also empirically evaluate our method on multiple real-world graphs with ground truth communities.
arXiv Detail & Related papers (2020-02-09T18:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.