Related papers: Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction

Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction

URL: http://arxiv.org/abs/2507.03840v1
Date: Fri, 04 Jul 2025 23:53:47 GMT
Title: Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction
Authors: Manasa Kaniselvan, Alexander Maeder, Chen Hao Xia, Alexandros Nikolaos Ziogas, Mathieu Luisier,
Abstract summary: Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales.<n>However, the graph representations required for this task tend to be densely connected.<n>We present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph.
Score: 76.62155593340763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales, enabling investigation of the electronic properties of materials with extended defects, interfaces, or exhibiting disordered phases. However, as interactions between atomic orbitals typically extend over 10+ angstroms, the graph representations required for this task tend to be densely connected, and the memory requirements to perform training and inference on these large structures can exceed the limits of modern GPUs. Here we present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph to reduce the number of embedding exchanges between GPUs. Our implementation shows strong scaling up to 128 GPUs, and weak scaling up to 512 GPUs with 87% parallel efficiency for structures with 3,000 to 190,000 atoms on the Alps supercomputer.

Related papers

Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training [1.6954729278440728]
Graph neural networks (GNNs) can leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes.<n>Many real-world graphs exceed the memory capacity of a GPU due to their sheer size, and using GNNs on them requires techniques such as mini-batch sampling to scale.<n>We propose a three-dimensional (3D) parallel approach for full-graph training that tackles these issues and scales to billion-edge graphs.
arXiv Detail & Related papers (2025-05-07T02:49:52Z)
Scalable Training of Trustworthy and Energy-Efficient Predictive Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN [5.386946356430465]
We develop and train scalable, trustworthy, and energy-efficient predictive graph foundation models (GFMs) using HydraGNN. HydraGNN expands the boundaries of graph neural network (GNN) computations in both training scale and data diversity. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures.
arXiv Detail & Related papers (2024-06-12T21:21:42Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
Architectural Implications of Embedding Dimension during GCN on CPU and GPU [6.650945912906685]
Graph Convolutional Networks (GCNs) are a widely used type of GNN for transductive graph learning problems. GCN is a challenging algorithm from an architecture perspective due to inherent sparsity, low data reuse, and massive memory capacity requirements.
arXiv Detail & Related papers (2022-12-01T19:23:12Z)
Nimble GNN Embedding with Tensor-Train Decomposition [10.726368002799765]
This paper describes a new method for representing embedding tables of graph neural networks (GNNs) more compactly via tensor-train (TT) decomposition. In some cases, our model without explicit node features on input can even match the accuracy of models that use node features.
arXiv Detail & Related papers (2022-06-21T17:57:35Z)
All-optical graph representation learning using integrated diffractive photonic computing units [51.15389025760809]
Photonic neural networks perform brain-inspired computations using photons instead of electrons. We propose an all-optical graph representation learning architecture, termed diffractive graph neural network (DGNN) We demonstrate the use of DGNN extracted features for node and graph-level classification tasks with benchmark databases and achieve superior performance.
arXiv Detail & Related papers (2022-04-23T02:29:48Z)
Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z)
MG-GCN: Scalable Multi-GPU GCN Training Framework [1.7188280334580197]
Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs. MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers. MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.
arXiv Detail & Related papers (2021-10-17T00:41:43Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters. Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.