Related papers: Effective and Scalable Clustering on Massive Attributed Graphs

Effective and Scalable Clustering on Massive Attributed Graphs

URL: http://arxiv.org/abs/2102.03826v1
Date: Sun, 7 Feb 2021 15:50:28 GMT
Title: Effective and Scalable Clustering on Massive Attributed Graphs
Authors: Renchi Yang, Jieming Shi, Yin Yang, Keke Huang, Shiqi Zhang and Xiaokui Xiao
Abstract summary: We propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G. ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.
Score: 25.161619807810215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar. This problem is challenging on massive graphs, e.g., with millions of nodes and billions of edges. For such graphs, existing solutions either incur prohibitively high costs, or produce clustering results with compromised quality. In this paper, we propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G. The main contributions of ACMin are twofold: (i) a novel formulation of the k-AGC problem based on an attributed multi-hop conductance quality measure custom-made for this problem setting, which effectively captures cluster coherence in terms of both topological proximities and attribute similarities, and (ii) a linear-time optimization solver that obtains high-quality clusters iteratively, based on efficient matrix operations such as orthogonal iterations, an alternative optimization approach, as well as an initialization technique that significantly speeds up the convergence of ACMin in practice. Extensive experiments, comparing 11 competitors on 6 real datasets, demonstrate that ACMin consistently outperforms all competitors in terms of result quality measured against ground-truth labels, while being up to orders of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, ACMin outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.

Related papers

Graph Probability Aggregation Clustering [5.377020739388736]
We propose a graph-based fuzzy clustering algorithm that unifies the global clustering objective function with a local clustering constraint. The entire GPAC framework is formulated as a multi-constrained optimization problem, which can be solved using the Lagrangian method. Experiments conducted on synthetic, real-world, and deep learning datasets demonstrate that GPAC not only exceeds existing state-of-the-art methods in clustering performance but also excels in computational efficiency.
arXiv Detail & Related papers (2025-02-27T09:11:32Z)
A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation [14.327262299413789]
We develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and attributed hypergraph clustering (AHC) We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
arXiv Detail & Related papers (2024-08-10T06:59:51Z)
Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [56.26113670151363]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph. Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources. We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC) CGC achieves state-of-the-art performance with a more efficient condensation process.
arXiv Detail & Related papers (2024-05-22T14:57:09Z)
Effective Clustering on Large Attributed Bipartite Graphs [10.701751248623863]
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately. We propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets.
arXiv Detail & Related papers (2024-05-20T09:58:27Z)
Cluster-based Graph Collaborative Filtering [55.929052969825825]
Graph Convolution Networks (GCNs) have succeeded in learning user and item representations for recommendation systems. Most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convolution. We propose a novel GCN-based recommendation model, termed Cluster-based Graph Collaborative Filtering (ClusterGCF)
arXiv Detail & Related papers (2024-04-16T07:05:16Z)
Efficient High-Quality Clustering for Large Bipartite Graphs [7.533043289759316]
k-Bipartite Graph Clustering (k-BGC) is to partition the target vertices set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics. This paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs.
arXiv Detail & Related papers (2023-12-28T09:50:56Z)
Scalable and Effective Conductance-based Graph Clustering [9.938406925123722]
We develop a graph clustering framework textitPCon. We show that many existing solutions can be reduced to our framework. Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
arXiv Detail & Related papers (2022-11-22T12:45:27Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
Attention-driven Graph Clustering Network [49.040136530379094]
We propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN) AGCN exploits a heterogeneous-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature. AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion.
arXiv Detail & Related papers (2021-08-12T02:30:38Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.