Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition
- URL: http://arxiv.org/abs/2405.13707v1
- Date: Wed, 22 May 2024 14:57:09 GMT
- Title: Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition
- Authors: Xinyi Gao, Tong Chen, Wentao Zhang, Junliang Yu, Guanhua Ye, Quoc Viet Hung Nguyen, Hongzhi Yin,
- Abstract summary: Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph.
Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources.
We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC)
CGC achieves state-of-the-art performance with a more efficient condensation process.
- Score: 56.26113670151363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing prevalence of large-scale graphs poses a significant challenge for graph neural network training, attributed to their substantial computational requirements. In response, graph condensation (GC) emerges as a promising data-centric solution aiming to substitute the large graph with a small yet informative condensed graph to facilitate data-efficient GNN training. However, existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources. In this paper, we revisit existing GC optimization strategies and identify two pervasive issues: 1. various GC optimization strategies converge to class-level node feature matching between the original and condensed graphs, making the optimization target coarse-grained despite the complex computations; 2. to bridge the original and condensed graphs, existing GC methods rely on a Siamese graph network architecture that requires time-consuming bi-level optimization with iterative gradient computations. To overcome these issues, we propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC), which refines the node feature matching from the class-to-class paradigm into a novel class-to-node paradigm. Remarkably, this refinement also simplifies the GC optimization as a class partition problem, which can be efficiently solved by any clustering methods. Moreover, CGC incorporates a pre-defined graph structure to enable a closed-form solution for condensed node features, eliminating the back-and-forth gradient descent in existing GC approaches without sacrificing accuracy. Extensive experiments demonstrate that CGC achieves state-of-the-art performance with a more efficient condensation process. For instance, compared with the seminal GC method (i.e., GCond), CGC condenses the largest Reddit graph within 10 seconds, achieving a 2,680X speedup and a 1.4% accuracy increase.
Related papers
- GC-Bench: An Open and Unified Benchmark for Graph Condensation [54.70801435138878]
We develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation.
GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity.
We have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research.
arXiv Detail & Related papers (2024-06-30T07:47:34Z) - RobGC: Towards Robust Graph Condensation [61.259453496191696]
Graph neural networks (GNNs) have attracted widespread attention for their impressive capability of graph representation learning.
However, the increasing prevalence of large-scale graphs presents a significant challenge for GNN training due to their computational demands.
We propose graph condensation (GC) to generate an informative compact graph that enables efficient training of GNNs while retaining performance.
arXiv Detail & Related papers (2024-06-19T04:14:57Z) - GCondenser: Benchmarking Graph Condensation [26.458605619132385]
This paper proposes the first large-scale graph condensation benchmark, GCondenser, to holistically evaluate and compare mainstream GC methods.
GCondenser includes a standardised GC paradigm, consisting of condensation, validation, and evaluation procedures, as well as enabling extensions to new GC methods and datasets.
arXiv Detail & Related papers (2024-05-23T07:25:31Z) - Simple Graph Condensation [30.85754566420301]
Graph condensation involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on a large-scale original graph.
We introduce the Simple Graph Condensation (SimGC) framework, which aligns the condensed graph with the original graph from the input layer to the prediction layer.
SimGC achieves a significant speedup of up to 10 times compared to existing graph condensation methods.
arXiv Detail & Related papers (2024-03-22T05:04:48Z) - Graph Condensation: A Survey [49.41718583061147]
The rapid growth of graph data poses significant challenges in storage, transmission, and particularly the training of graph neural networks (GNNs)
To address these challenges, graph condensation (GC) has emerged as an innovative solution.
GC focuses on a compact yet highly representative graph, enabling GNNs trained on it to achieve performance comparable to those trained on the original large graph.
arXiv Detail & Related papers (2024-01-22T06:47:00Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Gradient Coding with Dynamic Clustering for Straggler Mitigation [57.9123881133818]
GC-DC regulates the number of straggling workers in each cluster based on the straggler behavior in the previous iteration.
We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
arXiv Detail & Related papers (2020-11-03T18:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.