Scalable training of graph convolutional neural networks for fast and
accurate predictions of HOMO-LUMO gap in molecules
- URL: http://arxiv.org/abs/2207.11333v1
- Date: Fri, 22 Jul 2022 20:54:22 GMT
- Title: Scalable training of graph convolutional neural networks for fast and
accurate predictions of HOMO-LUMO gap in molecules
- Authors: Jong Youl Choi, Pei Zhang, Kshitij Mehta, Andrew Blanchard,
Massimiliano Lupo Pasini
- Abstract summary: This work focuses on building GCNN models on HPC systems to predict material properties of millions of molecules.
We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch.
We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap.
- Score: 1.8947048356389908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Convolutional Neural Network (GCNN) is a popular class of deep learning
(DL) models in material science to predict material properties from the graph
representation of molecular structures. Training an accurate and comprehensive
GCNN surrogate for molecular design requires large-scale graph datasets and is
usually a time-consuming process. Recent advances in GPUs and distributed
computing open a path to reduce the computational cost for GCNN training
effectively. However, efficient utilization of high performance computing (HPC)
resources for training requires simultaneously optimizing large-scale data
management and scalable stochastic batched optimization techniques. In this
work, we focus on building GCNN models on HPC systems to predict material
properties of millions of molecules. We use HydraGNN, our in-house library for
large-scale GCNN training, leveraging distributed data parallelism in PyTorch.
We use ADIOS, a high-performance data management framework for efficient
storage and reading of large molecular graph data. We perform parallel training
on two open-source large-scale graph datasets to build a GCNN predictor for an
important quantum property known as the HOMO-LUMO gap. We measure the
scalability, accuracy, and convergence of our approach on two DOE
supercomputers: the Summit supercomputer at the Oak Ridge Leadership Computing
Facility (OLCF) and the Perlmutter system at the National Energy Research
Scientific Computing Center (NERSC). We present our experimental results with
HydraGNN showing i) reduction of data loading time up to 4.2 times compared
with a conventional method and ii) linear scaling performance for training up
to 1,024 GPUs on both Summit and Perlmutter.
Related papers
- Scalable Training of Trustworthy and Energy-Efficient Predictive Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN [5.386946356430465]
We develop and train scalable, trustworthy, and energy-efficient predictive graph foundation models (GFMs) using HydraGNN.
HydraGNN expands the boundaries of graph neural network (GNN) computations in both training scale and data diversity.
Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures.
arXiv Detail & Related papers (2024-06-12T21:21:42Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Transfer learning for atomistic simulations using GNNs and kernel mean
embeddings [24.560340485988128]
We propose a transfer learning algorithm that leverages the ability of graph neural networks (GNNs) to represent chemical environments together with kernel mean embeddings.
We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance.
arXiv Detail & Related papers (2023-06-02T14:58:16Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Unlearning Graph Classifiers with Limited Data Resources [39.29148804411811]
Controlled data removal is becoming an important feature of machine learning models for data-sensitive Web applications.
It is still largely unknown how to perform efficient machine unlearning of graph neural networks (GNNs)
Our main contribution is the first known nonlinear approximate graph unlearning method based on GSTs.
Our second contribution is a theoretical analysis of the computational complexity of the proposed unlearning mechanism.
Our third contribution are extensive simulation results which show that, compared to complete retraining of GNNs after each removal request, the new GST-based approach offers, on average, a 10.38x speed-up
arXiv Detail & Related papers (2022-11-06T20:46:50Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and
Preprocessing [0.0]
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data.
Existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs.
This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas.
arXiv Detail & Related papers (2021-12-16T00:37:37Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural
Networks [8.460826851547294]
efficient graph analysis using modern machine learning is receiving a growing level of attention.
Deep learning approaches often operate over the entire adjacency matrix.
It is desirable to identify efficient measures to reduce both run-time and memory requirements.
arXiv Detail & Related papers (2020-10-23T19:47:42Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.