Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN
- URL: http://arxiv.org/abs/2406.12909v2
- Date: Fri, 28 Jun 2024 17:58:27 GMT
- Title: Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN
- Authors: Massimiliano Lupo Pasini, Jong Youl Choi, Kshitij Mehta, Pei Zhang, David Rogers, Jonghyun Bae, Khaled Z. Ibrahim, Ashwin M. Aji, Karl W. Schulz, Jorda Polo, Prasanna Balaprakash,
- Abstract summary: We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN.
HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity.
Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures.
- Score: 5.386946356430465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFM training to tens of thousands of GPUs on datasets that consist of hundreds of millions of graphs. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as the total energy and atomic forces. Using over 150 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge National Laboratory. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier. Hyperparameter optimization (HPO) was performed on over 64,000 GPUs on Frontier to select GFM architectures with high accuracy. Early stopping was applied on each GFM architecture for energy awareness in performing such an extreme-scale task. The training of an ensemble of highest-ranked GFM architectures continued until convergence to establish uncertainty quantification (UQ) capabilities with ensemble learning. Our contribution opens the door for rapidly developing, training, and deploying GFMs using large-scale computational resources to enable AI-accelerated materials discovery and design.
Related papers
- On the Scalability of GNNs for Molecular Graphs [7.402389334892391]
Graph Neural Networks (GNNs) are yet to show the benefits of scale due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures.
We analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs.
For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets.
arXiv Detail & Related papers (2024-04-17T17:11:31Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Enabling Accelerators for Graph Computing [0.0]
Graph Neural Networks (GNNs) offer a novel paradigm for learning on graph-structured data.
GNNs present new computational challenges compared to conventional neural networks.
This thesis aims to develop a better understanding of how GNNs interact with the underlying hardware.
arXiv Detail & Related papers (2023-12-16T23:31:20Z) - Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit
Diversity Modeling [60.0185734837814]
Graph neural networks (GNNs) have found extensive applications in learning from graph data.
To bolster the generalization capacity of GNNs, it has become customary to augment training graph structures with techniques like graph augmentations.
This study introduces the concept of Mixture-of-Experts (MoE) to GNNs, with the aim of augmenting their capacity to adapt to a diverse range of training graph structures.
arXiv Detail & Related papers (2023-04-06T01:09:36Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Gradient Gating for Deep Multi-Rate Learning on Graphs [62.25886489571097]
We present Gradient Gating (G$2$), a novel framework for improving the performance of Graph Neural Networks (GNNs)
Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph.
arXiv Detail & Related papers (2022-10-02T13:19:48Z) - Scalable training of graph convolutional neural networks for fast and
accurate predictions of HOMO-LUMO gap in molecules [1.8947048356389908]
This work focuses on building GCNN models on HPC systems to predict material properties of millions of molecules.
We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch.
We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap.
arXiv Detail & Related papers (2022-07-22T20:54:22Z) - Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors
to Sequences [55.329402218608365]
We propose the Neighbor2Seq to transform the hierarchical neighborhood of each node into a sequence.
We evaluate our method on a massive graph with more than 111 million nodes and 1.6 billion edges.
Results show that our proposed method is scalable to massive graphs and achieves superior performance across massive and medium-scale graphs.
arXiv Detail & Related papers (2022-02-07T16:38:36Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs.
It is challenging to accelerate training of GCNs due to substantial and irregular data communication.
We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.