SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search
- URL: http://arxiv.org/abs/2506.03657v1
- Date: Wed, 04 Jun 2025 07:47:25 GMT
- Title: SubSearch: Robust Estimation and Outlier Detection for Stochastic Block Models via Subgraph Search
- Authors: Leonardo Martins Bianco, Christine Keribin, Zacharie Naulet,
- Abstract summary: We propose an algorithm for robustly estimating SBM parameters by exploring the space of subgraphs in search of one that closely aligns with the model's assumptions.<n>Our approach also functions as an outlier detection method, properly identifying nodes responsible for the graph's deviation from the model and going beyond simple techniques like pruning high-degree nodes.
- Score: 2.082364067210557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Community detection is a fundamental task in graph analysis, with methods often relying on fitting models like the Stochastic Block Model (SBM) to observed networks. While many algorithms can accurately estimate SBM parameters when the input graph is a perfect sample from the model, real-world graphs rarely conform to such idealized assumptions. Therefore, robust algorithms are crucial-ones that can recover model parameters even when the data deviates from the assumed distribution. In this work, we propose SubSearch, an algorithm for robustly estimating SBM parameters by exploring the space of subgraphs in search of one that closely aligns with the model's assumptions. Our approach also functions as an outlier detection method, properly identifying nodes responsible for the graph's deviation from the model and going beyond simple techniques like pruning high-degree nodes. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our method.
Related papers
- When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values [10.051332392614368]
We prove that a Pattern-by-Pattern strategy (PbP) accurately approximates Bayes probabilities in various missing data scenarios.<n>Our analysis provides a comprehensive view on the logistic regression with missing values.
arXiv Detail & Related papers (2025-07-17T11:52:27Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.<n>We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.<n>Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Graph-Based Model-Agnostic Data Subsampling for Recommendation Systems [29.713557081485995]
Data subsampling is widely used to speed up the training of recommendation systems.
Most subsampling methods are model-based and often require a pre-trained pilot model to measure data importance.
We propose model-agnostic data subsampling methods by only exploring input data structure represented by graphs.
arXiv Detail & Related papers (2023-05-25T18:00:15Z) - Learning to Bound: A Generative Cram\'er-Rao Bound [25.739449801033846]
We introduce a novel approach to approximate the Cram'er-Rao bound (CRB) using data-driven methods.
We model the distribution of the measurements and obtain an approximation of the CRB, which we call Generative Cram'er-Rao Bound (GCRB)
arXiv Detail & Related papers (2022-03-07T20:31:53Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Regularization of Mixture Models for Robust Principal Graph Learning [0.0]
A regularized version of Mixture Models is proposed to learn a principal graph from a distribution of $D$-dimensional data points.
Parameters of the model are iteratively estimated through an Expectation-Maximization procedure.
arXiv Detail & Related papers (2021-06-16T18:00:02Z) - Community Detection in the Stochastic Block Model by Mixed Integer
Programming [3.8073142980733]
Degree-Corrected Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence.
Standard approach of community detection based on the DCSBM is to search for the model parameters that are the most likely to have produced the observed network data through maximum likelihood estimation (MLE)
We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph.
arXiv Detail & Related papers (2021-01-26T22:04:40Z) - CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements.
In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data.
For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.