AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design
- URL: http://arxiv.org/abs/2507.02006v1
- Date: Wed, 02 Jul 2025 00:35:43 GMT
- Title: AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design
- Authors: Shakya Jayakody, Youpeng Zhao, Jun Wang,
- Abstract summary: Graph convolutional networks (GCNs) are fundamental in various scientific applications, ranging from biomedical protein-protein interactions (PPI) to large-scale recommendation systems.<n>An essential component for modeling graph structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM)<n>SpGEMMs are often conducted in an out-of-core fashion due to limited GPU memory space in resource-constrained systems.<n>We propose AIRES, a novel algorithm-system co-design solution to accelerate out-of-core SpGEMM computation for GCNs.
- Score: 6.554916179445241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph convolutional networks (GCNs) are fundamental in various scientific applications, ranging from biomedical protein-protein interactions (PPI) to large-scale recommendation systems. An essential component for modeling graph structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM). As the size of graph data continues to scale up, SpGEMMs are often conducted in an out-of-core fashion due to limited GPU memory space in resource-constrained systems. Albeit recent efforts that aim to alleviate the memory constraints of out-of-core SpGEMM through either GPU feature caching, hybrid CPU-GPU memory layout, or performing the computation in sparse format, current systems suffer from both high I/O latency and GPU under-utilization issues. In this paper, we first identify the problems of existing systems, where sparse format data alignment and memory allocation are the main performance bottlenecks, and propose AIRES, a novel algorithm-system co-design solution to accelerate out-of-core SpGEMM computation for GCNs. Specifically, from the algorithm angle, AIRES proposes to alleviate the data alignment issues on the block level for matrices in sparse formats and develops a tiling algorithm to facilitate row block-wise alignment. On the system level, AIRES employs a three-phase dynamic scheduling that features a dual-way data transfer strategy utilizing a tiered memory system: integrating GPU memory, GPU Direct Storage (GDS), and host memory to reduce I/O latency and improve throughput. Evaluations show that AIRES significantly outperforms the state-of-the-art methods, achieving up to 1.8x lower latency in real-world graph processing benchmarks.
Related papers
- Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks [0.0]
Graph Convolutional Networks (GCNs) are state-of-the-art deep learning models for representation learning on graphs.
We propose a message-passing architecture that leverages NUMA-based memory access properties.
We also re-engineered the backpropagation algorithm specific to GCNs within our proposed accelerator.
arXiv Detail & Related papers (2024-11-06T12:00:51Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution
Networks [12.181052673940465]
Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains.
We present Accel-GCN, a GPU accelerator architecture for GCNs.
Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively.
arXiv Detail & Related papers (2023-08-22T23:12:17Z) - Communication-Efficient Graph Neural Networks with Probabilistic
Neighborhood Expansion Analysis and Caching [59.8522166385372]
Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs.
This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings.
We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data.
arXiv Detail & Related papers (2023-05-04T21:04:01Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for
Memory-Efficient Graph Convolutional Neural Networks [4.669338722185048]
A unique property of Graph convolutional neural networks (GCNs) is that its two primary execution stages, aggregation and combination, exhibit drastically different dataflows.
We present GROW, a GCN accelerator based on Gustavson's algorithm to architect a row-wise product based sparse-dense GEMM accelerator.
arXiv Detail & Related papers (2022-03-01T00:26:31Z) - AutoGMap: Learning to Map Large-scale Sparse Graphs on Memristive
Crossbars [21.835545525155453]
This work proposes the dynamic sparsity-aware mapping scheme generating method that models the problem as a sequential decision-making problem.
Our generating model (LSTM) generates remarkable mapping performance on a small-scale typical graph/matrix data.
Our coding framework of the scheme is intuitive and has promising adaptability with the deployment or compilation system.
arXiv Detail & Related papers (2021-11-15T11:37:47Z) - GCNear: A Hybrid Architecture for Efficient GCN Training with
Near-Memory Processing [8.130391367247793]
Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data.
It is challenging to realize efficient GCN training, especially on large graphs.
This paper presents GCNear, a hybrid architecture to tackle these challenges.
arXiv Detail & Related papers (2021-11-01T03:47:07Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.