Effective and Scalable Clustering on Massive Attributed Graphs
- URL: http://arxiv.org/abs/2102.03826v1
- Date: Sun, 7 Feb 2021 15:50:28 GMT
- Title: Effective and Scalable Clustering on Massive Attributed Graphs
- Authors: Renchi Yang, Jieming Shi, Yin Yang, Keke Huang, Shiqi Zhang and
Xiaokui Xiao
- Abstract summary: We propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G.
ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster.
In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.
- Score: 25.161619807810215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a graph G where each node is associated with a set of attributes, and a
parameter k specifying the number of output clusters, k-attributed graph
clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes
within the same cluster share similar topological and attribute
characteristics, while those in different clusters are dissimilar. This problem
is challenging on massive graphs, e.g., with millions of nodes and billions of
edges. For such graphs, existing solutions either incur prohibitively high
costs, or produce clustering results with compromised quality.
In this paper, we propose ACMin, an effective approach to k-AGC that yields
high-quality clusters with cost linear to the size of the input graph G. The
main contributions of ACMin are twofold: (i) a novel formulation of the k-AGC
problem based on an attributed multi-hop conductance quality measure
custom-made for this problem setting, which effectively captures cluster
coherence in terms of both topological proximities and attribute similarities,
and (ii) a linear-time optimization solver that obtains high-quality clusters
iteratively, based on efficient matrix operations such as orthogonal
iterations, an alternative optimization approach, as well as an initialization
technique that significantly speeds up the convergence of ACMin in practice.
Extensive experiments, comparing 11 competitors on 6 real datasets,
demonstrate that ACMin consistently outperforms all competitors in terms of
result quality measured against ground-truth labels, while being up to orders
of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph
dataset with 265.2 million edges and 1.1 billion attribute values, ACMin
outputs high-quality results for 5-AGC within 1.68 hours using a single CPU
core, while none of the 11 competitors finish within 3 days.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation [14.327262299413789]
We develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and attributed hypergraph clustering (AHC)
We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
arXiv Detail & Related papers (2024-08-10T06:59:51Z) - Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [56.26113670151363]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph.
Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources.
We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC)
CGC achieves state-of-the-art performance with a more efficient condensation process.
arXiv Detail & Related papers (2024-05-22T14:57:09Z) - Effective Clustering on Large Attributed Bipartite Graphs [10.701751248623863]
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes.
Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains.
However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately.
We propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets.
arXiv Detail & Related papers (2024-05-20T09:58:27Z) - Efficient High-Quality Clustering for Large Bipartite Graphs [7.533043289759316]
k-Bipartite Graph Clustering (k-BGC) is to partition the target vertices set in a bipartite graph into k disjoint clusters.
The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics.
This paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs.
arXiv Detail & Related papers (2023-12-28T09:50:56Z) - Scalable and Effective Conductance-based Graph Clustering [9.938406925123722]
We develop a graph clustering framework textitPCon.
We show that many existing solutions can be reduced to our framework.
Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
arXiv Detail & Related papers (2022-11-22T12:45:27Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Attention-driven Graph Clustering Network [49.040136530379094]
We propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN)
AGCN exploits a heterogeneous-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion.
arXiv Detail & Related papers (2021-08-12T02:30:38Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.