Related papers: DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs

DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs

URL: http://arxiv.org/abs/2508.16769v1
Date: Fri, 22 Aug 2025 20:05:38 GMT
Title: DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs
Authors: Yuebo Luo, Shiyang Li, Junran Tao, Kiran Thorat, Xi Xie, Hongwu Peng, Nuo Xu, Caiwen Ding, Shaoyi Huang,
Abstract summary: Heterogeneous Graph Neural Networks (HGNNs) can better interpret EDA circuit graphs as they capture both topological relationships and geometric features.<n>We propose DR-CircuitGNN, a fast GPU kernel design by leveraging row-wise sparsity-aware Dynamic-ReLU and optimizing SpMM kernels during heterogeneous message-passing to accelerate HGNNs training on EDA-related circuit graph datasets.
Score: 24.65955578784123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing scale and complexity of integrated circuit design have led to increased challenges in Electronic Design Automation (EDA). Graph Neural Networks (GNNs) have emerged as a promising approach to assist EDA design as circuits can be naturally represented as graphs. While GNNs offer a foundation for circuit analysis, they often fail to capture the full complexity of EDA designs. Heterogeneous Graph Neural Networks (HGNNs) can better interpret EDA circuit graphs as they capture both topological relationships and geometric features. However, the improved representation capability comes at the cost of even higher computational complexity and processing cost due to their serial module-wise message-passing scheme, creating a significant performance bottleneck. In this paper, we propose DR-CircuitGNN, a fast GPU kernel design by leveraging row-wise sparsity-aware Dynamic-ReLU and optimizing SpMM kernels during heterogeneous message-passing to accelerate HGNNs training on EDA-related circuit graph datasets. To further enhance performance, we propose a parallel optimization strategy that maximizes CPU-GPU concurrency by concurrently processing independent subgraphs using multi-threaded CPU initialization and GPU kernel execution via multiple cudaStreams. Our experiments show that on three representative CircuitNet designs (small, medium, large), the proposed method can achieve up to 3.51x and 4.09x speedup compared to the SOTA for forward and backward propagation, respectively. On full-size CircuitNet and sampled Mini-CircuitNet, our parallel design enables up to 2.71x speed up over the official DGL implementation cuSPARSE with negligible impact on correlation scores and error rates.

Related papers

Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks [0.0]
Graph Convolutional Networks (GCNs) are state-of-the-art deep learning models for representation learning on graphs. We propose a message-passing architecture that leverages NUMA-based memory access properties. We also re-engineered the backpropagation algorithm specific to GCNs within our proposed accelerator.
arXiv Detail & Related papers (2024-11-06T12:00:51Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
CktGNN: Circuit Graph Neural Network for Electronic Design Automation [67.29634073660239]
This paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing. We introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers. Our work paves the way toward a learning-based open-sourced design automation for analog circuits.
arXiv Detail & Related papers (2023-08-31T02:20:25Z)
Exploiting On-chip Heterogeneity of Versal Architecture for GNN Inference Acceleration [0.5249805590164902]
Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML) applications, such as social network analysis, bioinformatics, etc. We leverage the heterogeneous computing capabilities of AMD Versal ACAP architecture to accelerate GNN inference. For Graph Convolutional Network (GCN) inference, our approach leads to a speedup of 3.9-96.7x compared to designs using PL only on the same ACAP device.
arXiv Detail & Related papers (2023-08-04T23:57:55Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation [7.54579279348595]
We propose a flexible architecture called SPA-GCN for accelerating Graph Convolutional Networks (GCN) on graphs. We show that SPA-GCN can deliver a high speedup compared to a multi-core CPU implementation and a GPU implementation.
arXiv Detail & Related papers (2021-11-10T20:47:57Z)
GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. It is challenging to accelerate training of GCNs due to substantial and irregular data communication. We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.