Calibrated Nonparametric Scan Statistics for Anomalous Pattern Detection
in Graphs
- URL: http://arxiv.org/abs/2206.12786v1
- Date: Sun, 26 Jun 2022 04:59:13 GMT
- Title: Calibrated Nonparametric Scan Statistics for Anomalous Pattern Detection
in Graphs
- Authors: Chunpai Wang, Daniel B. Neill, Feng Chen
- Abstract summary: Nonparametric scan statistics (NPSSs) identify connected subgraphs with a higher than expected proportion of significant nodes.
NPSSs fail to account for the multiplicity of the recently calibrated statistic over the anomalous subgraphs.
We develop a new statistical approach to recalibrate NPSSs, correctly adjusting for multiple hypothesis testing and taking the underlying graph structure into account.
- Score: 4.756490355031122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new approach, the calibrated nonparametric scan statistic
(CNSS), for more accurate detection of anomalous patterns in large-scale,
real-world graphs. Scan statistics identify connected subgraphs that are
interesting or unexpected through maximization of a likelihood ratio statistic;
in particular, nonparametric scan statistics (NPSSs) identify subgraphs with a
higher than expected proportion of individually significant nodes. However, we
show that recently proposed NPSS methods are miscalibrated, failing to account
for the maximization of the statistic over the multiplicity of subgraphs. This
results in both reduced detection power for subtle signals, and low precision
of the detected subgraph even for stronger signals. Thus we develop a new
statistical approach to recalibrate NPSSs, correctly adjusting for multiple
hypothesis testing and taking the underlying graph structure into account.
While the recalibration, based on randomization testing, is computationally
expensive, we propose both an efficient (approximate) algorithm and new,
closed-form lower bounds (on the expected maximum proportion of significant
nodes for subgraphs of a given size, under the null hypothesis of no anomalous
patterns). These advances, along with the integration of recent core-tree
decomposition methods, enable CNSS to scale to large real-world graphs, with
substantial improvement in the accuracy of detected subgraphs. Extensive
experiments on both semi-synthetic and real-world datasets are demonstrated to
validate the effectiveness of our proposed methods, in comparison with
state-of-the-art counterparts.
Related papers
- Graph Out-of-Distribution Generalization with Controllable Data
Augmentation [51.17476258673232]
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties.
Due to the selection bias of training and testing data, distribution deviation is widespread.
We propose OOD calibration to measure the distribution deviation of virtual samples.
arXiv Detail & Related papers (2023-08-16T13:10:27Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - High Dimensional Statistical Estimation under One-bit Quantization [27.718986773043643]
One-bit (binary) data are preferable in many applications because of the efficiency in signal storage, processing, transmission, and enhancement of privacy.
In this paper, we study three fundamental statistical estimation problems.
Under both sub-Gaussian and heavy-tailed regimes, new estimators that handle high-dimensional scaling are proposed.
arXiv Detail & Related papers (2022-02-26T15:13:04Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Transfer Learning in Large-scale Gaussian Graphical Models with False
Discovery Rate Control [6.230751621285322]
An estimation algorithm, Trans-CLIME, is proposed and shown to attain a faster convergence rate than the minimax rate in the single study setting.
A significant decrease in prediction errors and a significant increase in power for link detection are observed.
arXiv Detail & Related papers (2020-10-21T14:39:14Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - An Optimal Statistical and Computational Framework for Generalized
Tensor Estimation [10.899518267165666]
This paper describes a flexible framework for low-rank tensor estimation problems.
It includes many important instances from applications in computational imaging, genomics, and network analysis.
arXiv Detail & Related papers (2020-02-26T01:54:35Z) - Statistical Outlier Identification in Multi-robot Visual SLAM using
Expectation Maximization [18.259478519717426]
This paper introduces a novel and distributed method for detecting inter-map loop closure outliers in simultaneous localization and mapping (SLAM)
The proposed algorithm does not rely on a good initialization and can handle more than two maps at a time.
arXiv Detail & Related papers (2020-02-07T06:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.