Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework
- URL: http://arxiv.org/abs/2407.09700v1
- Date: Fri, 12 Jul 2024 21:50:19 GMT
- Title: Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework
- Authors: Rui Li, Qiming Sun, Xing Zhang, Garnet Kin-Lic Chan,
- Abstract summary: We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF.
Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF.
- Score: 4.368931200886271
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF. As a core functionality, this provides a GPU implementation of two-electron repulsion integrals (ERIs) for contracted basis sets comprising up to g functions using Rys quadrature. As an illustration of how this can accelerate a quantum chemistry workflow, we describe how to use the ERIs efficiently in the integral-direct Hartree-Fock Fock build and nuclear gradient construction. Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF, and performance comparable to other GPU-accelerated quantum chemical packages including GAMESS and QUICK on a single NVIDIA A100 GPU.
Related papers
- Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters [0.0]
opt-UM and opt-Brc introduce significant enhancements to Hartree-Fock caculations up to $f$-type angular momentum functions.
Opt-Brc excels for smaller systems and for highly contracted triple-$zeta$ basis sets, while opt-UM is advantageous for large molecular systems.
arXiv Detail & Related papers (2024-07-31T08:49:06Z) - Multi-GPU RI-HF Energies and Analytic Gradients $-$ Towards High Throughput Ab Initio Molecular Dynamics [0.0]
This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock energies and analytic gradients using multiple Graphics Processing Units (GPUs)
The algorithm is especially designed for high throughput emphab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms)
arXiv Detail & Related papers (2024-07-29T00:14:10Z) - GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states [11.514211053741338]
We present an implementation and application of graphics processing unitaccelerated ph-AFQMC.
Using multi-Slater trial states, ph-AFQMC has the potential faithfully treat strongly correlated systems.
Our work significantly enhances the efficiency of MSDAFQMC calculations for large, strongly correlated molecules.
arXiv Detail & Related papers (2024-06-12T15:15:17Z) - Enhancing GPU-acceleration in the Python-based Simulations of Chemistry Framework [6.4347138500286665]
We describe our contribution as industrial stakeholders to the existing open-source GPU4PySCF project.
We have integrated GPU acceleration into other PySCF functionality including Density Functional Theory (DFT)
GPU4PySCF delivers 30 times speedup over a 32-core CPU node, resulting in approximately 90% cost savings for most DFT tasks.
arXiv Detail & Related papers (2024-04-15T04:35:09Z) - Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs [3.7101665559244874]
This paper presents a SYCL implementation of Multi-formedLayer Perceptrons (MLPs) for the Intel Data Center GPU Max 1550.
We show with a simple model that this results in a significant increase in arithmetic intensity, leading to improved performance, especially for inference.
arXiv Detail & Related papers (2024-03-26T11:38:39Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - Fast quantum circuit simulation using hardware accelerated general
purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits.
For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.