Related papers: The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains

The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains

URL: http://arxiv.org/abs/2410.24169v1
Date: Thu, 31 Oct 2024 17:35:57 GMT
Title: The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains
Authors: Eric Qu, Aditi S. Krishnapriyan,
Abstract summary: We study scaling in Neural Network Interatomic Potentials (NNIPs) NNIPs act as surrogate models for ab initio quantum mechanical calculations. We develop an NNIP architecture designed for scaling: the Efficiently Scaled Attention Interatomic Potential (EScAIP)
Score: 4.340917737559795
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling has been critical in improving model performance and generalization in machine learning. It involves how a model's performance changes with increases in model size or input data, as well as how efficiently computational resources are utilized to support this growth. Despite successes in other areas, the study of scaling in Neural Network Interatomic Potentials (NNIPs) remains limited. NNIPs act as surrogate models for ab initio quantum mechanical calculations. The dominant paradigm here is to incorporate many physical domain constraints into the model, such as rotational equivariance. We contend that these complex constraints inhibit the scaling ability of NNIPs, and are likely to lead to performance plateaus in the long run. In this work, we take an alternative approach and start by systematically studying NNIP scaling strategies. Our findings indicate that scaling the model through attention mechanisms is efficient and improves model expressivity. These insights motivate us to develop an NNIP architecture designed for scalability: the Efficiently Scaled Attention Interatomic Potential (EScAIP). EScAIP leverages a multi-head self-attention formulation within graph neural networks, applying attention at the neighbor-level representations. Implemented with highly-optimized attention GPU kernels, EScAIP achieves substantial gains in efficiency--at least 10x faster inference, 5x less memory usage--compared to existing NNIPs. EScAIP also achieves state-of-the-art performance on a wide range of datasets including catalysts (OC20 and OC22), molecules (SPICE), and materials (MPTrj). We emphasize that our approach should be thought of as a philosophy rather than a specific model, representing a proof-of-concept for developing general-purpose NNIPs that achieve better expressivity through scaling, and continue to scale efficiently with increased computational resources and training data.

Related papers

Enhancing material behavior discovery using embedding-oriented Physically-Guided Neural Networks with Internal Variables [0.0]
Physically Guided Neural Networks with Internal Variables are SciML tools that use only observable data for training and unravel internal state relations.<n>Despite their potential, these models face challenges in scalability when applied to high-dimensional data such as fine-grid spatial fields or time-evolving systems.<n>We propose some enhancements to the PGNNIV framework that address these scalability limitations through reduced-order modeling techniques.
arXiv Detail & Related papers (2025-08-01T12:33:21Z)
Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens [0.5745241788717261]
We empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK)<n>Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior.<n>We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models.
arXiv Detail & Related papers (2025-07-07T14:17:44Z)
Leveraging neural network interatomic potentials for a foundation model of chemistry [2.66269503676104]
HackNIP is a two-stage pipeline that leverages pretrained neural network interatomic potentials.<n>It first extracts fixed-length feature vectors from NIP foundation models and then uses these embeddings to train shallow ML models.<n>This study investigates whether such a hybridization approach, by hacking" the NIP, can outperform end-to-end deep neural networks.
arXiv Detail & Related papers (2025-06-23T10:49:19Z)
Stochastic Engrams for Efficient Continual Learning with Binarized Neural Networks [4.014396794141682]
We propose a novel approach that integrates plasticityally-activated engrams as a gating mechanism for metaplastic binarized neural networks (mBNNs) Our findings demonstrate (A) an improved stability vs. a trade-off, (B) a reduced memory intensiveness, and (C) an enhanced performance in binarized architectures.
arXiv Detail & Related papers (2025-03-27T12:21:00Z)
Activation-wise Propagation: A Universal Strategy to Break Timestep Constraints in Spiking Neural Networks for 3D Data Processing [29.279985043923386]
We introduce Activation-wise Membrane Potential Propagation (AMP2), a novel state update mechanism for spiking neurons. Inspired by skip connections in deep networks, AMP2 incorporates the membrane potential of neurons into network, eliminating the need for iterative updates. Our method achieves significant improvements across various 3D modalities, including 3D point clouds and event streams.
arXiv Detail & Related papers (2025-02-18T11:52:25Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Spatiotemporal Graph Learning with Direct Volumetric Information Passing and Feature Enhancement [62.91536661584656]
We propose a dual-module framework, Cell-embedded and Feature-enhanced Graph Neural Network (aka, CeFeGNN) for learning.<n>We embed learnable cell attributions to the common node-edge message passing process, which better captures the spatial dependency of regional features.<n>Experiments on various PDE systems and one real-world dataset demonstrate that CeFeGNN achieves superior performance compared with other baselines.
arXiv Detail & Related papers (2024-09-26T16:22:08Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Towards Efficient Deep Spiking Neural Networks Construction with Spiking Activity based Pruning [17.454100169491497]
We propose a structured pruning approach based on the activity levels of convolutional kernels named Spiking Channel Activity-based (SCA) network pruning framework. Inspired by synaptic plasticity mechanisms, our method dynamically adjusts the network's structure by pruning and regenerating convolutional kernels during training, enhancing the model's adaptation to the current target task.
arXiv Detail & Related papers (2024-06-03T07:44:37Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors. LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task. Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence. We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN) We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z)
Data efficiency and extrapolation trends in neural network interatomic potentials [0.0]
We show how architectural and optimization choices influence the generalization of neural network interatomic potentials (NNIPs) We show that test errors in NNIP follow a scaling relation and can be robust to noise, but cannot predict MD stability in the high-accuracy regime. Our work provides a deep learning justification for the extrapolation performance of many common NNIPs.
arXiv Detail & Related papers (2023-02-12T00:34:05Z)
The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks [0.368986335765876]
quantization and pruning of parameters can both compress the model size, reduce memory footprints, and facilitate low-latency execution. We study various combinations of pruning and quantization in isolation, cumulatively, and simultaneously to a state-of-the-art SNN targeting gesture recognition. We show that this state-of-the-art model is amenable to aggressive parameter quantization, not suffering from any loss in accuracy down to ternary weights.
arXiv Detail & Related papers (2023-02-08T16:25:20Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically. Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z)
Exploiting Spiking Dynamics with Spatial-temporal Feature Normalization in Graph Learning [9.88508686848173]
Biological spiking neurons with intrinsic dynamics underlie the powerful representation and learning capabilities of the brain. Despite recent tremendous progress in spiking neural networks (SNNs) for handling Euclidean-space tasks, it still remains challenging to exploit SNNs in processing non-Euclidean-space data. Here we present a general spike-based modeling framework that enables the direct training of SNNs for graph learning.
arXiv Detail & Related papers (2021-06-30T11:20:16Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations [86.41674945012369]
We develop a scalable and expressive Graph Neural Networks model, ForceNet, to approximate atomic forces. Our proposed ForceNet is able to predict atomic forces more accurately than state-of-the-art physics-based GNNs.
arXiv Detail & Related papers (2021-03-02T03:09:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.