Statistical Limits for Testing Correlation of Hypergraphs
- URL: http://arxiv.org/abs/2202.05888v1
- Date: Fri, 11 Feb 2022 20:11:21 GMT
- Title: Statistical Limits for Testing Correlation of Hypergraphs
- Authors: Mingao Yuan, Zuofeng Shang
- Abstract summary: We consider the hypothesis testing of correlation between two $m$-uniform hypergraphs on $n$ unlabelled nodes.
Under the null hypothesis, the hypergraphs are independent, while under the alternative hypothesis, the hyperdges have the same marginal distributions as in the null hypothesis but are correlated after some unknown node permutation.
- Score: 4.898744396854313
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we consider the hypothesis testing of correlation between two
$m$-uniform hypergraphs on $n$ unlabelled nodes. Under the null hypothesis, the
hypergraphs are independent, while under the alternative hypothesis, the
hyperdges have the same marginal distributions as in the null hypothesis but
are correlated after some unknown node permutation. We focus on two scenarios:
the hypergraphs are generated from the Gaussian-Wigner model and the dense
Erd\"{o}s-R\'{e}nyi model. We derive the sharp information-theoretic testing
threshold. Above the threshold, there exists a powerful test to distinguish the
alternative hypothesis from the null hypothesis. Below the threshold, the
alternative hypothesis and the null hypothesis are not distinguishable. The
threshold involves $m$ and decreases as $m$ gets larger. This indicates testing
correlation of hypergraphs ($m\geq3$) becomes easier than testing correlation
of graphs ($m=2$)
Related papers
- Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - Collaborative non-parametric two-sample testing [55.98760097296213]
The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected.
We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure.
Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning.
arXiv Detail & Related papers (2024-02-08T14:43:56Z) - Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model [4.540236408836132]
We study the hypergraph $boldsymbolbeta$-model with multiple layers, which allows for hyperedges of different sizes across the layers.
We derive the rates of convergence of maximum likelihood (ML) estimates and establish their minimax rate optimality.
We also consider the goodness-of-fit problem in the hypergraph $boldsymbolbeta$-model.
arXiv Detail & Related papers (2023-07-06T07:23:06Z) - Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem.
We consider detecting the presence of a planted $Gr(ngamma, n-alpha)$ subhypergraph in a $Gr(n, n-beta) hypergraph.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known.
arXiv Detail & Related papers (2023-04-17T10:38:08Z) - How Does Pseudo-Labeling Affect the Generalization Error of the
Semi-Supervised Gibbs Algorithm? [73.80001705134147]
We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm.
The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset.
arXiv Detail & Related papers (2022-10-15T04:11:56Z) - Detection threshold for correlated Erd\H{o}s-R\'enyi graphs via densest
subgraphs [9.12788494573002]
We solve the problem of detecting edge correlation between two ErdHos-R'enyi random graphs on $n$ unlabeled nodes.
A key novelty in our work is an interesting connection between the detection problem and the densest subgraph of an ErdHos-R'enyi graph.
arXiv Detail & Related papers (2022-03-28T08:32:43Z) - Hypothesis Testing for Equality of Latent Positions in Random Graphs [0.2741266294612775]
We consider the hypothesis testing problem that two vertices $i$ and $j$th have the same latent positions, possibly up to scaling.
We propose several test statistics based on the empirical Mahalanobis distances between the $i$th and $j$th rows of either the adjacency or the normalized Laplacian spectral embedding of the graph.
Using these test statistics, we address the model selection problem of choosing between the standard block model and its degree-corrected variant.
arXiv Detail & Related papers (2021-05-23T01:27:23Z) - Heterogeneous Dense Subhypergraph Detection [6.903929927172917]
We study the problem of testing the existence of a heterogeneous dense subhypergraph.
The null hypothesis corresponds to a heterogeneous Erd"os-R'enyi uniform random hypergraph.
The alternative hypothesis corresponds to a heterogeneous uniform random hypergraph that contains a dense subhypergraph.
arXiv Detail & Related papers (2021-04-08T20:44:22Z) - Sharp detection boundaries on testing dense subhypergraph [6.903929927172917]
We study the problem of testing the existence of a dense subhypergraph.
The null hypothesis is an Erdos-Renyi uniform random hypergraph.
The alternative hypothesis is a uniform random hypergraph that contains a dense subhypergraph.
arXiv Detail & Related papers (2021-01-12T16:31:47Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - L2R2: Leveraging Ranking for Abductive Reasoning [65.40375542988416]
The abductive natural language inference task ($alpha$NLI) is proposed to evaluate the abductive reasoning ability of a learning system.
A novel $L2R2$ approach is proposed under the learning-to-rank framework.
Experiments on the ART dataset reach the state-of-the-art in the public leaderboard.
arXiv Detail & Related papers (2020-05-22T15:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.