Spectral clustering in the Gaussian mixture block model
- URL: http://arxiv.org/abs/2305.00979v3
- Date: Wed, 10 Apr 2024 23:06:38 GMT
- Title: Spectral clustering in the Gaussian mixture block model
- Authors: Shuangping Li, Tselil Schramm,
- Abstract summary: We study the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models.
We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs.
- Score: 2.668787455520979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge \tau$ for a pre-specified threshold $\tau$. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors $d\to \infty$ as the size of the network $n \to \infty$. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.
Related papers
- Multi-Community Spectral Clustering for Geometric Graphs [0.0699049312989311]
We introduce a spectral clustering algorithm for community recovery on graphs generated by this model.<n>We prove weak consistency and show that a simple local refinement step ensures strong consistency.<n>A key ingredient is an application of a non-standard version of Davis-Kahan theorem to control eigenspaces when eigenvalues are not simple.
arXiv Detail & Related papers (2025-07-27T14:09:00Z) - AGMixup: Adaptive Graph Mixup for Semi-supervised Node Classification [38.72918509842547]
Mixup is a technique that enhances model generalization by interpolating between data points using a mixing ratio $lambda$ in the image domain.
This paper proposes an Adaptive Graph Mixup (AGMixup) framework for semi-supervised node classification.
arXiv Detail & Related papers (2024-12-11T07:04:35Z) - An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph [0.0]
This paper discusses how to generate general graph node embeddings from knowledge graph representations.
The embedded space is composed of a number of sub-features to mimic both local affinity and remote structural relevance.
arXiv Detail & Related papers (2024-07-22T14:43:10Z) - GeoMix: Towards Geometry-Aware Data Augmentation [76.09914619612812]
Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification.
We propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing.
arXiv Detail & Related papers (2024-07-15T12:58:04Z) - Symmetry-driven embedding of networks in hyperbolic space [0.4779196219827508]
We present a Markov chain Monte Carlo algorithm that samples the posterior distribution of a hyperbolic random graph model.
We show that the samples are consistent with current algorithms while providing added credible intervals for the coordinates and all network properties.
arXiv Detail & Related papers (2024-06-15T18:44:02Z) - Information-Theoretic Thresholds for Planted Dense Cycles [52.076657911275525]
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences.
For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $tau$, and an edge-wise signal-to-noise ratio $lambda$.
arXiv Detail & Related papers (2024-02-01T03:39:01Z) - Spectral Clustering of Attributed Multi-relational Graphs [11.486261673963392]
Graph clustering aims at discovering a natural grouping of the nodes such as similar nodes are assigned to a common cluster.
We propose SpectralMix, a joint dimensionality reduction technique for multi-relational graphs with categorical node attributes.
arXiv Detail & Related papers (2023-11-03T11:05:29Z) - On the Equivalence of Graph Convolution and Mixup [70.0121263465133]
This paper investigates the relationship between graph convolution and Mixup techniques.
Under two mild conditions, graph convolution can be viewed as a specialized form of Mixup.
We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup.
arXiv Detail & Related papers (2023-09-29T23:09:54Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - flow-based clustering and spectral clustering: a comparison [0.688204255655161]
We study a novel graph clustering method for data with an intrinsic network structure.
We exploit an intrinsic network structure of data to construct Euclidean feature vectors.
Our results indicate that our clustering methods can cope with certain graph structures.
arXiv Detail & Related papers (2022-06-20T21:49:52Z) - Gaussian mixture model on nodes of Bayesian network given maximal
parental cliques [0.0]
We will explain why and how we use Gaussian mixture models in network.
We propose a new method, called double iteration algorithm, to optimize the mixture model.
arXiv Detail & Related papers (2022-04-20T15:14:01Z) - Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations.
We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z) - Scaling Graph Clustering with Distributed Sketches [1.1011268090482575]
We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections.
We show that our method produces embeddings that yield performant clustering results given a fully-dynamic block model stream.
We also discuss the effects of block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.
arXiv Detail & Related papers (2020-07-24T17:38:04Z) - Graph Neural Networks with Composite Kernels [60.81504431653264]
We re-interpret node aggregation from the perspective of kernel weighting.
We present a framework to consider feature similarity in an aggregation scheme.
We propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.
arXiv Detail & Related papers (2020-05-16T04:44:29Z) - Heterogeneous Graph Neural Networks for Malicious Account Detection [64.0046412312209]
We present GEM, the first heterogeneous graph neural network approach for detecting malicious accounts.
We learn discriminative embeddings from heterogeneous account-device graphs based on two fundamental weaknesses of attackers, i.e. device aggregation and activity aggregation.
Experiments show that our approaches consistently perform promising results compared with competitive methods over time.
arXiv Detail & Related papers (2020-02-27T18:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.