Related papers: High-performance training and inference for deep equivariant interatomic potentials

High-performance training and inference for deep equivariant interatomic potentials

URL: http://arxiv.org/abs/2504.16068v1
Date: Tue, 22 Apr 2025 17:47:01 GMT
Title: High-performance training and inference for deep equivariant interatomic potentials
Authors: Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R. Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R. Hu, Tess Smidt, Anders Johansson, William C. Witt, Boris Kozinsky, Albert Musaelian,
Abstract summary: We introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials.<n>Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to a factor of 18.

Related papers

OmniFluids: Unified Physics Pre-trained Modeling of Fluid Dynamics [25.066485418709114]
We introduce OmniFluids, a unified physics pre-trained operator learning framework.<n>It integrates physics-only pre-training, coarse-grid operator distillation, and few-shot fine-tuning.<n>It significantly outperforms state-of-the-art AI-driven methods in flow field reconstruction and turbulence statistics accuracy.
arXiv Detail & Related papers (2025-06-12T16:23:02Z)
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units.<n>It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains [4.340917737559795]
We study scaling in Neural Network Interatomic Potentials (NNIPs) NNIPs act as surrogate models for ab initio quantum mechanical calculations. We develop an NNIP architecture designed for scaling: the Efficiently Scaled Attention Interatomic Potential (EScAIP)
arXiv Detail & Related papers (2024-10-31T17:35:57Z)
Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors [1.4609888393206634]
We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. We compare a graph neural network and kernel-based transformer and demonstrate that we can avoid operations while achieving realistic reconstruction. The best graph neural network model shows improvement in the jet transverse momentum resolution by up to 50% compared to the rule-based algorithm.
arXiv Detail & Related papers (2023-09-13T08:16:15Z)
Symbolic Regression on FPGAs for Fast Machine Learning Inference [2.0920303420933273]
High-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) We introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR) We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy.
arXiv Detail & Related papers (2023-05-06T17:04:02Z)
PDSketch: Integrated Planning Domain Programming and Learning [86.07442931141637]
We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models. Details of the transition model will be filled in by trainable neural networks.
arXiv Detail & Related papers (2023-03-09T18:54:12Z)
RAF: Holistic Compilation for Deep Learning Model Training [17.956035630476173]
In this paper, we present RAF, a deep learning compiler for training. Unlike existing DLCs, RAF accepts a forward model and in-house generates a training graph. RAF is able to systematically consolidate graph optimizations for performance, memory and distributed training.
arXiv Detail & Related papers (2023-03-08T17:51:13Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules [1.8947048356389908]
This work focuses on building GCNN models on HPC systems to predict material properties of millions of molecules. We use HydraGNN, our in-house library for large-scale GCNN training, leveraging distributed data parallelism in PyTorch. We perform parallel training on two open-source large-scale graph datasets to build a GCNN predictor for an important quantum property known as the HOMO-LUMO gap.
arXiv Detail & Related papers (2022-07-22T20:54:22Z)
Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z)
ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations [86.41674945012369]
We develop a scalable and expressive Graph Neural Networks model, ForceNet, to approximate atomic forces. Our proposed ForceNet is able to predict atomic forces more accurately than state-of-the-art physics-based GNNs.
arXiv Detail & Related papers (2021-03-02T03:09:06Z)
hxtorch: PyTorch for BrainScaleS-2 -- Perceptrons on Analog Neuromorphic Hardware [0.0]
We present software facilitating the usage of the BrainScaleS-2 analog neuromorphic hardware system as an inference accelerator. We provide accelerator support for vector-matrix multiplications and convolutions and corresponding software-based autograd functionality. As an application of the introduced framework, we present a model that classifies activities of daily living with smartphone sensor data.
arXiv Detail & Related papers (2020-06-23T16:33:49Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.