Related papers: Spectral goodness-of-fit tests for complete and partial network data

Spectral goodness-of-fit tests for complete and partial network data

URL: http://arxiv.org/abs/2106.09702v1
Date: Thu, 17 Jun 2021 17:56:30 GMT
Title: Spectral goodness-of-fit tests for complete and partial network data
Authors: Shane Lubold and Bolun Liu and Tyler H. McCormick
Abstract summary: We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides a straightforward, computationally fast way of selecting parameters. Our method leads to improved community detection algorithms.
Score: 1.7188280334580197
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Networks describe the, often complex, relationships between individual actors. In this work, we address the question of how to determine whether a parametric model, such as a stochastic block model or latent space model, fits a dataset well and will extrapolate to similar data. We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides an straightforward, computationally fast way of selecting parameters in a number of commonly used network models. For example, we show how to select the dimension of the latent space in latent space models. Unlike other network goodness-of-fit methods, our general approach does not require simulating from a candidate parametric model, which can be cumbersome with large graphs, and eliminates the need to choose a particular set of statistics on the graph for comparison. It also allows us to perform goodness-of-fit tests on partial network data, such as Aggregated Relational Data. We show with simulations that our method performs well in many situations of interest. We analyze several empirically relevant networks and show that our method leads to improved community detection algorithms. R code to implement our method is available on Github.

Related papers

SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search [2.082364067210557]
We propose an algorithm for robustly estimating SBM parameters by exploring the space of subgraphs in search of one that closely aligns with the model's assumptions.<n>Our approach also functions as an outlier detection method, properly identifying nodes responsible for the graph's deviation from the model and going beyond simple techniques like pruning high-degree nodes.
arXiv Detail & Related papers (2025-06-04T07:47:25Z)
PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus [26.366299016589256]
We present a real-time method for robust estimation of multiple instances of geometric models from noisy data. A neural network segments the input data into clusters representing potential model instances. We demonstrate state-of-the-art performance on these as well as multiple established datasets, with inference times as small as five milliseconds per image.
arXiv Detail & Related papers (2024-01-26T14:54:56Z)
Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z)
Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation [25.79728190384834]
Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain. We propose solving this problem by leveraging on comprehensive corpus such as ImageNet to select a meaningful distribution. Our goal is to select a set of samples from the corpus for the given model.
arXiv Detail & Related papers (2023-05-10T03:25:23Z)
VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data. We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data. We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Network Estimation by Mixing: Adaptivity and More [2.3478438171452014]
We propose a mixing strategy that leverages available arbitrary models to improve their individual performances. The proposed method is computationally efficient and almost tuning-free. We show that the proposed method performs equally well as the oracle estimate when the true model is included as individual candidates.
arXiv Detail & Related papers (2021-06-05T05:17:04Z)
Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z)
Amortized Bayesian model comparison with evidential deep learning [0.12314765641075436]
We propose a novel method for performing Bayesian model comparison using specialized deep learning architectures. Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset. We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work.
arXiv Detail & Related papers (2020-04-22T15:15:46Z)
A Multi-Channel Neural Graphical Event Model with Negative Evidence [76.51278722190607]
Event datasets are sequences of events of various types occurring irregularly over the time-line. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions.
arXiv Detail & Related papers (2020-02-21T23:10:50Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.