Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE
- URL: http://arxiv.org/abs/2504.10700v1
- Date: Mon, 14 Apr 2025 20:48:19 GMT
- Title: Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE
- Authors: Jesun Firoz, Franco Pellegrini, Mario Geiger, Darren Hsu, Jenna A. Bilbrey, Han-Yi Chou, Maximilian Stadler, Markus Hoehnerbach, Tingyu Wang, Dejun Lin, Emine Kucukbenli, Henry W. Sprueill, Ilyes Batatia, Sotiris S. Xantheas, MalSoon Lee, Chris Mundy, Gabor Csanyi, Justin S. Smith, Ponnuswamy Sadayappan, Sutanay Choudhury,
- Abstract summary: Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operate on 3D molecular graph structures.<n>This paper presents optimizations for two critical phases of CFM training: data distribution and model training.<n>We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution.
- Score: 6.143643133286159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution. For the training phase, we identify symmetric tensor contraction as the key computational kernel in MACE and optimize this kernel to improve the overall performance. Our combined approach of balanced data distribution and kernel optimization significantly enhances the training process of MACE. Experimental results demonstrate a substantial speedup, reducing per-epoch execution time for training from 12 to 2 minutes on 740 GPUs with a 2.6M sample dataset.
Related papers
- GDSG: Graph Diffusion-based Solution Generator for Optimization Problems in MEC Networks [109.17835015018532]
We present a Graph Diffusion-based Solution Generation (GDSG) method.<n>This approach is designed to work with suboptimal datasets while converging to the optimal solution large probably.<n>We build GDSG as a multi-task diffusion model utilizing a Graph Neural Network (GNN) to acquire the distribution of high-quality solutions.
arXiv Detail & Related papers (2024-12-11T11:13:43Z) - Physics-informed MeshGraphNets (PI-MGNs): Neural finite element solvers
for non-stationary and nonlinear simulations on arbitrary meshes [13.41003911618347]
This work introduces PI-MGNs, a hybrid approach that combines PINNs and MGNs to solve non-stationary and nonlinear partial differential equations (PDEs) on arbitrary meshes.
Results show that the model scales well to large and complex meshes, although it is trained on small generic meshes only.
arXiv Detail & Related papers (2024-02-16T13:34:51Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - Entropy Aware Training for Fast and Accurate Distributed GNN [0.0]
Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs.
We develop techniques that reduce training time and improve accuracy.
We implement our algorithms on the DistDGL framework and observe that our training techniques scale much better than the existing training approach.
arXiv Detail & Related papers (2023-11-04T13:11:49Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Transfer learning for atomistic simulations using GNNs and kernel mean
embeddings [24.560340485988128]
We propose a transfer learning algorithm that leverages the ability of graph neural networks (GNNs) to represent chemical environments together with kernel mean embeddings.
We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance.
arXiv Detail & Related papers (2023-06-02T14:58:16Z) - Graph Neural Network-Inspired Kernels for Gaussian Processes in
Semi-Supervised Learning [4.644263115284322]
Graph neural networks (GNNs) emerged recently as a promising class of models for graph-structured data in semi-supervised learning.
We introduce this inductive bias into GPs to improve their predictive performance for graph-structured data.
We show that these graph-based kernels lead to competitive classification and regression performance, as well as advantages in time, compared with the respective GNNs.
arXiv Detail & Related papers (2023-02-12T01:07:56Z) - Scalable training of graph convolutional neural networks for fast and
accurate predictions of HOMO-LUMO gap in molecules [1.8947048356389908]
This work focuses on building GCNN models on HPC systems to predict material properties of millions of molecules.
We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch.
We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap.
arXiv Detail & Related papers (2022-07-22T20:54:22Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations.
Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization.
It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.