A practical test for a planted community in heterogeneous networks
- URL: http://arxiv.org/abs/2101.05928v1
- Date: Fri, 15 Jan 2021 01:34:14 GMT
- Title: A practical test for a planted community in heterogeneous networks
- Authors: Mingao Yuan and Qian Wen
- Abstract summary: For a real network data, the existence of a dense subgraph is generally unknown.
The power of the test is theoretically investigated and we evaluate the performance of the test by simulation and real data example.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the fundamental task in graph data mining is to find a planted
community(dense subgraph), which has wide application in biology, finance, spam
detection and so on. For a real network data, the existence of a dense subgraph
is generally unknown. Statistical tests have been devised to testing the
existence of dense subgraph in a homogeneous random graph. However, many
networks present extreme heterogeneity, that is, the degrees of nodes or
vertexes don't concentrate on a typical value. The existing tests designed for
homogeneous random graph are not straightforwardly applicable to the
heterogeneous case. Recently, scan test was proposed for detecting a dense
subgraph in heterogeneous(inhomogeneous) graph(\cite{BCHV19}). However, the
computational complexity of the scan test is generally not polynomial in the
graph size, which makes the test impractical for large or moderate networks. In
this paper, we propose a polynomial-time test that has the standard normal
distribution as the null limiting distribution. The power of the test is
theoretically investigated and we evaluate the performance of the test by
simulation and real data example.
Related papers
- Testing Dependency of Weighted Random Graphs [4.0554893636822]
We study the task of detecting the edge dependency between two random graphs.
For general edge-weight distributions, we establish thresholds at which optimal testing becomes information-theoretically possible or impossible.
arXiv Detail & Related papers (2024-09-23T10:07:41Z) - Network two-sample test for block models [16.597465729143813]
We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same model.
We adopt the block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models.
We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test.
arXiv Detail & Related papers (2024-06-10T04:28:37Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Graph Out-of-Distribution Generalization with Controllable Data
Augmentation [51.17476258673232]
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties.
Due to the selection bias of training and testing data, distribution deviation is widespread.
We propose OOD calibration to measure the distribution deviation of virtual samples.
arXiv Detail & Related papers (2023-08-16T13:10:27Z) - Seq-HGNN: Learning Sequential Node Representation on Heterogeneous Graph [57.2953563124339]
We propose a novel heterogeneous graph neural network with sequential node representation, namely Seq-HGNN.
We conduct extensive experiments on four widely used datasets from Heterogeneous Graph Benchmark (HGB) and Open Graph Benchmark (OGB)
arXiv Detail & Related papers (2023-05-18T07:27:18Z) - Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels [2.8406702588667807]
Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology.
Many of these testing methodologies operate under the implicit assumption that the correspondence across networks is a priori known.
This power loss due to shuffling is theoretically explored in the context of random dot product and block model networks.
arXiv Detail & Related papers (2022-08-18T05:27:18Z) - AZ-whiteness test: a test for uncorrelated noise on spatio-temporal
graphs [19.407150082045636]
We present the first whiteness test for graphs, i.e., a serial whiteness test for a serial time series associated with the nodes of a graph.
We show how the test can be employed to assess the quality-temporal forecasting models by analyzing the prediction residuals to the graphs stream.
arXiv Detail & Related papers (2022-04-23T19:43:19Z) - Adjusted chi-square test for degree-corrected block models [13.122543280692641]
We propose a goodness-of-fit test for degree-corrected block models (DCSBM)
We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $d_i$ grows to infinity.
Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution.
arXiv Detail & Related papers (2020-12-30T05:20:59Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.