Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering
- URL: http://arxiv.org/abs/2602.08519v1
- Date: Mon, 09 Feb 2026 11:07:24 GMT
- Title: Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering
- Authors: Yunhui Liu, Pengyu Qiu, Yu Xing, Yongchao Liu, Peng Du, Chuntao Hong, Jiajun Zheng, Tao Zheng, Tieke He,
- Abstract summary: Attributed Graph Clustering (AGC) is a fundamental unsupervised task that integrates structural topology and node attributes to uncover latent patterns in graph-structured data.<n>Despite its significance in industrial applications such as fraud detection and user segmentation, a significant chasm persists between academic research and real-world deployment.<n>We present PyAGC, a production-ready benchmark and library designed to stress-test AGC methods across diverse scales and structural properties.
- Score: 19.247242477915382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attributed Graph Clustering (AGC) is a fundamental unsupervised task that integrates structural topology and node attributes to uncover latent patterns in graph-structured data. Despite its significance in industrial applications such as fraud detection and user segmentation, a significant chasm persists between academic research and real-world deployment. Current evaluation protocols suffer from the small-scale, high-homophily citation datasets, non-scalable full-batch training paradigms, and a reliance on supervised metrics that fail to reflect performance in label-scarce environments. To bridge these gaps, we present PyAGC, a comprehensive, production-ready benchmark and library designed to stress-test AGC methods across diverse scales and structural properties. We unify existing methodologies into a modular Encode-Cluster-Optimize framework and, for the first time, provide memory-efficient, mini-batch implementations for a wide array of state-of-the-art AGC algorithms. Our benchmark curates 12 diverse datasets, ranging from 2.7K to 111M nodes, specifically incorporating industrial graphs with complex tabular features and low homophily. Furthermore, we advocate for a holistic evaluation protocol that mandates unsupervised structural metrics and efficiency profiling alongside traditional supervised metrics. Battle-tested in high-stakes industrial workflows at Ant Group, this benchmark offers the community a robust, reproducible, and scalable platform to advance AGC research towards realistic deployment. The code and resources are publicly available via GitHub (https://github.com/Cloudy1225/PyAGC), PyPI (https://pypi.org/project/pyagc), and Documentation (https://pyagc.readthedocs.io).
Related papers
- Core-based Hierarchies for Efficient GraphRAG [0.0]
GraphRAG organizes documents into a knowledge graph with hierarchical communities that can be summarized.<n>Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions.<n>To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time.
arXiv Detail & Related papers (2026-03-05T14:17:30Z) - Deep Global Clustering for Hyperspectral Image Segmentation: Concepts, Applications, and Open Challenges [1.9116784879310027]
Hyperspectral imaging (HSI) analysis faces computational bottlenecks due to massive data volumes that exceed available memory.<n>This report presents Deep Global Clustering (DGC), a conceptual framework for memory-efficient HSI segmentation.<n>DGC operates on small patches with overlapping regions to enforce consistency, enabling training in under 30 minutes on consumer hardware.
arXiv Detail & Related papers (2025-12-30T12:10:43Z) - Knowledge Graphs as Structured Memory for Embedding Spaces: From Training Clusters to Explainable Inference [3.2945446636945963]
Graph Memory (GM) is a structured non-parametric framework that augments embedding-based inference with a compact, relational memory over region-level prototypes.<n>By explicitly modeling reliability and relational structure, GM provides a principled bridge between local evidence and global consistency in non-parametric learning.
arXiv Detail & Related papers (2025-11-18T23:02:59Z) - Graph Structure Refinement with Energy-based Contrastive Learning [56.957793274727514]
We introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation.<n>We propose an Energy-based Contrastive Learning (ECL) guided Graph Structure Refinement (GSR) framework, denoted as ECL-GSR.<n>ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks.
arXiv Detail & Related papers (2024-12-20T04:05:09Z) - Benchmarking Federated Learning for Semantic Datasets: Federated Scene Graph Generation [3.499870393443268]
Federated learning (FL) enables decentralized training while preserving data privacy, yet existing FL benchmarks address relatively simple classification tasks.<n>We propose a benchmark process to establish an FL benchmark with controllable semantic heterogeneity across clients.<n>As a proof of concept, we construct a federated PSG benchmark, demonstrating the efficacy of the existing PSG methods in an FL setting.
arXiv Detail & Related papers (2024-12-11T08:10:46Z) - Generative and Contrastive Paradigms Are Complementary for Graph
Self-Supervised Learning [56.45977379288308]
Masked autoencoder (MAE) learns to reconstruct masked graph edges or node features.
Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph.
We propose graph contrastive masked autoencoder (GCMAE) framework to unify MAE and CL.
arXiv Detail & Related papers (2023-10-24T05:06:06Z) - Efficient Multi-View Graph Clustering with Local and Global Structure
Preservation [59.49018175496533]
We propose a novel anchor-based multi-view graph clustering framework termed Efficient Multi-View Graph Clustering with Local and Global Structure Preservation (EMVGC-LG)
Specifically, EMVGC-LG jointly optimize anchor construction and graph learning to enhance the clustering quality.
In addition, EMVGC-LG inherits the linear complexity of existing AMVGC methods respecting the sample number.
arXiv Detail & Related papers (2023-08-31T12:12:30Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - CGC: Contrastive Graph Clustering for Community Detection and Tracking [33.48636823444052]
We develop CGC, a novel end-to-end framework for graph clustering.
CGC learns node embeddings and cluster assignments in a contrastive graph learning framework.
We extend CGC for time-evolving data, where temporal graph clustering is performed in an incremental learning fashion.
arXiv Detail & Related papers (2022-04-05T17:34:47Z) - Self-supervised Graph-level Representation Learning with Local and
Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning.
Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters.
An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z) - Structure-Enhanced Meta-Learning For Few-Shot Graph Classification [53.54066611743269]
This work explores the potential of metric-based meta-learning for solving few-shot graph classification.
An implementation upon GIN, named SMFGIN, is tested on two datasets, Chembl and TRIANGLES.
arXiv Detail & Related papers (2021-03-05T09:03:03Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.