Scaling Graph Clustering with Distributed Sketches
- URL: http://arxiv.org/abs/2007.12669v1
- Date: Fri, 24 Jul 2020 17:38:04 GMT
- Title: Scaling Graph Clustering with Distributed Sketches
- Authors: Benjamin W. Priest, Alec Dunton, Geoffrey Sanders
- Abstract summary: We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections.
We show that our method produces embeddings that yield performant clustering results given a fully-dynamic block model stream.
We also discuss the effects of block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.
- Score: 1.1011268090482575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The unsupervised learning of community structure, in particular the
partitioning vertices into clusters or communities, is a canonical and
well-studied problem in exploratory graph analysis. However, like most graph
analyses the introduction of immense scale presents challenges to traditional
methods. Spectral clustering in distributed memory, for example, requires
hundreds of expensive bulk-synchronous communication rounds to compute an
embedding of vertices to a few eigenvectors of a graph associated matrix.
Furthermore, the whole computation may need to be repeated if the underlying
graph changes some low percentage of edge updates. We present a method inspired
by spectral clustering where we instead use matrix sketches derived from random
dimension-reducing projections. We show that our method produces embeddings
that yield performant clustering results given a fully-dynamic stochastic block
model stream using both the fast Johnson-Lindenstrauss and CountSketch
transforms. We also discuss the effects of stochastic block model parameters
upon the required dimensionality of the subsequent embeddings, and show how
random projections could significantly improve the performance of graph
clustering in distributed memory.
Related papers
- Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization [64.00146569922028]
Multi-view clustering methods based on anchor graph factorization lack adequate cluster interpretability for the decomposed matrix.
We address this limitation by using non-negative tensor factorization to decompose an anchor graph tensor that combines anchor graphs from multiple views.
arXiv Detail & Related papers (2024-04-01T03:23:55Z) - MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - Nonbacktracking spectral clustering of nonuniform hypergraphs [2.408714894793063]
We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator.
We propose an alternating algorithm for inference in a hypergraph blockmodel via linearized belief-propagation.
arXiv Detail & Related papers (2022-04-27T01:14:06Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - T-LoHo: A Bayesian Regularization Model for Structured Sparsity and
Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together.
We propose a new prior for high dimensional parameters with graphical relations.
We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z) - Spectral-Spatial Global Graph Reasoning for Hyperspectral Image
Classification [50.899576891296235]
Convolutional neural networks have been widely applied to hyperspectral image classification.
Recent methods attempt to address this issue by performing graph convolutions on spatial topologies.
arXiv Detail & Related papers (2021-06-26T06:24:51Z) - Regularization of Mixture Models for Robust Principal Graph Learning [0.0]
A regularized version of Mixture Models is proposed to learn a principal graph from a distribution of $D$-dimensional data points.
Parameters of the model are iteratively estimated through an Expectation-Maximization procedure.
arXiv Detail & Related papers (2021-06-16T18:00:02Z) - Multilayer Clustered Graph Learning [66.94201299553336]
We use contrastive loss as a data fidelity term, in order to properly aggregate the observed layers into a representative graph.
Experiments show that our method leads to a clustered clusters w.r.t.
We learn a clustering algorithm for solving clustering problems.
arXiv Detail & Related papers (2020-10-29T09:58:02Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.