Simple yet Effective Graph Distillation via Clustering
- URL: http://arxiv.org/abs/2505.20807v1
- Date: Tue, 27 May 2025 07:13:10 GMT
- Title: Simple yet Effective Graph Distillation via Clustering
- Authors: Yurui Lai, Taiyan Zhang, Renchi Yang,
- Abstract summary: graph data distillation (GDD) seeks to distill large graphs into compact and informative ones.<n>ClustGDD synthesizes the condensed graph and node attributes through fast and theoretically-grounded clustering.<n>ClustGDD consistently achieve superior or comparable performance to state-of-the-art GDD methods in terms of node classification.
- Score: 2.6217304977339473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite plentiful successes achieved by graph representation learning in various domains, the training of graph neural networks (GNNs) still remains tenaciously challenging due to the tremendous computational overhead needed for sizable graphs in practice. Recently, graph data distillation (GDD), which seeks to distill large graphs into compact and informative ones, has emerged as a promising technique to enable efficient GNN training. However, most existing GDD works rely on heuristics that align model gradients or representation distributions on condensed and original graphs, leading to compromised result quality, expensive training for distilling large graphs, or both. Motivated by this, this paper presents an efficient and effective GDD approach, ClustGDD. Under the hood, ClustGDD resorts to synthesizing the condensed graph and node attributes through fast and theoretically-grounded clustering that minimizes the within-cluster sum of squares and maximizes the homophily on the original graph. The fundamental idea is inspired by our empirical and theoretical findings unveiling the connection between clustering and empirical condensation quality using Fr\'echet Inception Distance, a well-known quality metric for synthetic images. Furthermore, to mitigate the adverse effects caused by the homophily-based clustering, ClustGDD refines the nodal attributes of the condensed graph with a small augmentation learned via class-aware graph sampling and consistency loss. Our extensive experiments exhibit that GNNs trained over condensed graphs output by ClustGDD consistently achieve superior or comparable performance to state-of-the-art GDD methods in terms of node classification on five benchmark datasets, while being orders of magnitude faster.
Related papers
- Heterogeneous Graph Prompt Learning via Adaptive Weight Pruning [37.735384483052044]
Graph Neural Networks (GNNs) have achieved remarkable success in various graph-based tasks (e.g., node classification or link prediction)<n>Despite their triumphs, GNNs still face challenges such as long training and inference times, difficulty in capturing complex relationships, and insufficient feature extraction.<n>We propose a novel framework combining graph prompts with weight pruning, called GPAWP, which aims to enhance the performance and efficiency of graph prompts by using fewer of them.
arXiv Detail & Related papers (2025-07-12T04:12:24Z) - Diffusion on Graph: Augmentation of Graph Structure for Node Classification [7.9233221247736205]
We propose on Graph Diffusion (DoG), which generates synthetic graph structures to boost the performance of graph neural networks (GNNs)<n>The synthetic graph structures generated by DoG are combined with the original graph to form an augmented graph for the training of node-level learning tasks.<n>To mitigate the adverse effect of the noise introduced by the synthetic graph structures, a low-rank regularization method is proposed.
arXiv Detail & Related papers (2025-03-16T16:39:25Z) - RobGC: Towards Robust Graph Condensation [61.259453496191696]
Graph neural networks (GNNs) have attracted widespread attention for their impressive capability of graph representation learning.<n>However, the increasing prevalence of large-scale graphs presents a significant challenge for GNN training due to their computational demands.<n>We propose graph condensation (GC) to generate an informative compact graph that enables efficient training of GNNs while retaining performance.
arXiv Detail & Related papers (2024-06-19T04:14:57Z) - Simple Graph Condensation [30.85754566420301]
Graph condensation involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on a large-scale original graph.
We introduce the Simple Graph Condensation (SimGC) framework, which aligns the condensed graph with the original graph from the input layer to the prediction layer.
SimGC achieves a significant speedup of up to 10 times compared to existing graph condensation methods.
arXiv Detail & Related papers (2024-03-22T05:04:48Z) - Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching [50.30124426442228]
Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns.
We propose a novel graph method named textbfCraftextbfTing textbfRationatextbf (textbfCTRL) which offers an optimized starting point closer to the original dataset's feature distribution.
arXiv Detail & Related papers (2024-02-07T14:49:10Z) - Breaking the Entanglement of Homophily and Heterophily in
Semi-supervised Node Classification [25.831508778029097]
We introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective.
We also propose ADPA as a new directed graph learning paradigm for AMUD.
arXiv Detail & Related papers (2023-12-07T07:54:11Z) - Graph Condensation for Inductive Node Representation Learning [59.76374128436873]
We propose mapping-aware graph condensation (MCond)
MCond integrates new nodes into the synthetic graph for inductive representation learning.
On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements.
arXiv Detail & Related papers (2023-07-29T12:11:14Z) - Structure-free Graph Condensation: From Large-scale Graphs to Condensed
Graph-free Data [91.27527985415007]
Existing graph condensation methods rely on the joint optimization of nodes and structures in the condensed graph.
We advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a large-scale graph into a small-scale graph node set.
arXiv Detail & Related papers (2023-06-05T07:53:52Z) - GCNH: A Simple Method For Representation Learning On Heterophilous
Graphs [4.051099980410583]
Graph Neural Networks (GNNs) are well-suited for learning on homophilous graphs.
Recent works have proposed extensions to standard GNN architectures to improve performance on heterophilous graphs.
We propose GCN for Heterophily (GCNH), a simple yet effective GNN architecture applicable to both heterophilous and homophilous scenarios.
arXiv Detail & Related papers (2023-04-21T11:26:24Z) - Localized Contrastive Learning on Graphs [110.54606263711385]
We introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL)
In spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.
arXiv Detail & Related papers (2022-12-08T23:36:00Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.