Scalable Deep Graph Clustering with Random-walk based Self-supervised
Learning
- URL: http://arxiv.org/abs/2112.15530v1
- Date: Fri, 31 Dec 2021 16:12:23 GMT
- Title: Scalable Deep Graph Clustering with Random-walk based Self-supervised
Learning
- Authors: Xiang Li (1), Dong Li (2), Ruoming Jin (2), Gagan Agrawal (3), Rajiv
Ramnath (4) ((1) Ohio State University, (2) Kent State University, (3)
Augusta University)
- Abstract summary: We show that our scalable deep clustering algorithm, RwSL, can continue to scale beyond graphs with more than one million nodes.
We also demonstrate how RwSL could perform node clustering on a graph with 1.8 billion edges using only a single GPU.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Web-based interactions can be frequently represented by an attributed graph,
and node clustering in such graphs has received much attention lately. Multiple
efforts have successfully applied Graph Convolutional Networks (GCN), though
with some limits on accuracy as GCNs have been shown to suffer from
over-smoothing issues. Though other methods (particularly those based on
Laplacian Smoothing) have reported better accuracy, a fundamental limitation of
all the work is a lack of scalability. This paper addresses this open problem
by relating the Laplacian smoothing to the Generalized PageRank and applying a
random-walk based algorithm as a scalable graph filter. This forms the basis
for our scalable deep clustering algorithm, RwSL, where through a
self-supervised mini-batch training mechanism, we simultaneously optimize a
deep neural network for sample-cluster assignment distribution and an
autoencoder for a clustering-oriented embedding. Using 6 real-world datasets
and 6 clustering metrics, we show that RwSL achieved improved results over
several recent baselines. Most notably, we show that RwSL, unlike all other
deep clustering frameworks, can continue to scale beyond graphs with more than
one million nodes, i.e., handle web-scale. We also demonstrate how RwSL could
perform node clustering on a graph with 1.8 billion edges using only a single
GPU.
Related papers
- FedGT: Federated Node Classification with Scalable Graph Transformer [27.50698154862779]
We propose a scalable textbfFederated textbfGraph textbfTransformer (textbfFedGT) in the paper.
FedGT computes clients' similarity based on the aligned global nodes with optimal transport.
arXiv Detail & Related papers (2024-01-26T21:02:36Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - EGRC-Net: Embedding-induced Graph Refinement Clustering Network [66.44293190793294]
We propose a novel graph clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net)
EGRC-Net effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance.
Our proposed methods consistently outperform several state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-19T09:08:43Z) - DOTIN: Dropping Task-Irrelevant Nodes for GNNs [119.17997089267124]
Recent graph learning approaches have introduced the pooling strategy to reduce the size of graphs for learning.
We design a new approach called DOTIN (underlineDrunderlineopping underlineTask-underlineIrrelevant underlineNodes) to reduce the size of graphs.
Our method speeds up GAT by about 50% on graph-level tasks including graph classification and graph edit distance.
arXiv Detail & Related papers (2022-04-28T12:00:39Z) - SCGC : Self-Supervised Contrastive Graph Clustering [1.1470070927586016]
Graph clustering discovers groups or communities within networks.
Deep learning methods such as autoencoders cannot incorporate rich structural information.
We propose Self-Supervised Contrastive Graph Clustering (SCGC)
arXiv Detail & Related papers (2022-04-27T01:38:46Z) - ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for
Efficient Feature Matching [15.620335576962475]
ClusterGNN is an attentional GNN architecture which operates on clusters for learning the feature matching task.
Our approach yields a 59.7% reduction in runtime and 58.4% reduction in memory consumption for dense detection.
arXiv Detail & Related papers (2022-04-25T14:43:15Z) - Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities.
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.