GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states
- URL: http://arxiv.org/abs/2406.08314v1
- Date: Wed, 12 Jun 2024 15:15:17 GMT
- Title: GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states
- Authors: Yifei Huang, Zhen Guo, Hung Q. Pham, Dingshun Lv,
- Abstract summary: We present an implementation and application of graphics processing unitaccelerated ph-AFQMC.
Using multi-Slater trial states, ph-AFQMC has the potential faithfully treat strongly correlated systems.
Our work significantly enhances the efficiency of MSDAFQMC calculations for large, strongly correlated molecules.
- Score: 11.514211053741338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The accuracy of phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) can be systematically improved with better trial states. Using multi-Slater determinant trial states, ph-AFQMC has the potential to faithfully treat strongly correlated systems, while balancing the static and dynamical correlations on an equal footing. This preprint presents an implementation and application of graphics processing unit-accelerated ph-AFQMC, for multi-Slater determinant trial wavefunctions (GPU-accelerated MSD-AFQMC), to enable efficient simulation of large-scale, strongly correlated systems. This approach allows for nearly-exact computation of ground state energies in multi-reference systems. Our GPU-accelerated MSD-AFQMC is implemented in the open-source code \texttt{ipie}, a Python-based AFQMC package [\textit{J. Chem. Theory Comput.}, 2022, 19(1): 109-121]. We benchmark the performance of the GPU code on transition-metal clusters like [Cu$_2$O$_2$]$^{2+}$ and [Fe$_2$S$_2$(SCH$_3$)]$^{2-}$. The GPU code achieves at least sixfold speedup in both cases, comparing the timings of a single A100 GPU to that of a 32-CPU node. For [Fe$_2$S$_2$(SCH$_3$)]$^{2-}$, we demonstrate that our GPU MSD-AFQMC can recover the dynamical correlation necessary for chemical accuracy with an MSD trial, despite the large number of determinants required ($>10^5$). Our work significantly enhances the efficiency of MSD-AFQMC calculations for large, strongly correlated molecules by utilizing GPUs, offering a promising path for exploring the electronic structure of transition metal complexes.
Related papers
- Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters [0.0]
opt-UM and opt-Brc introduce significant enhancements to Hartree-Fock caculations up to $f$-type angular momentum functions.
Opt-Brc excels for smaller systems and for highly contracted triple-$zeta$ basis sets, while opt-UM is advantageous for large molecular systems.
arXiv Detail & Related papers (2024-07-31T08:49:06Z) - Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework [4.368931200886271]
We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF.
Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF.
arXiv Detail & Related papers (2024-07-12T21:50:19Z) - A distributed multi-GPU ab initio density matrix renormalization group
algorithm with applications to the P-cluster of nitrogenase [1.7444066202370399]
We present the first distributed multi- GPU (Graphics Processing Unit) emphab initio density matrix renormalization (DMRG) algorithm.
We are able to reach an unprecedentedly large bond dimension $D=14000$ on 48 GPU.
This is nearly three times larger than the bond dimensions reported in previous DMRG calculations for the same system using only CPUs.
arXiv Detail & Related papers (2023-11-06T04:01:26Z) - On sampling determinantal and Pfaffian point processes on a quantum
computer [49.1574468325115]
DPPs were introduced by Macchi as a model in quantum optics the 1970s.
Most applications require sampling from a DPP, and given their quantum origin, it is natural to wonder whether sampling a DPP on a classical computer is easier than on a classical one.
Vanilla sampling consists in two steps, of respective costs $mathcalO(N3)$ and $mathcalO(Nr2)$ operations on a classical computer, where $r$ is the rank of the kernel matrix.
arXiv Detail & Related papers (2023-05-25T08:43:11Z) - ipie: A Python-based Auxiliary-Field Quantum Monte Carlo Program with
Flexibility and Efficiency on CPUs and GPUs [0.5735035463793008]
We report the development of a python-based auxiliary-field quantum Monte Carlo program, ipie, with preliminary timing benchmarks and new AFQMC results.
We demonstrate how implementations for both central and graphical processing units are achieved in ipie.
arXiv Detail & Related papers (2022-09-08T19:50:53Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Performance Evaluation and Acceleration of the QTensor Quantum Circuit
Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU.
Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z) - SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive
Knowledge Graphs [147.73127662757335]
We present scalable Multi-hOp REasoning (SMORE), the first general framework for both single-hop and multi-hop reasoning in Knowledge Graphs (KGs)
Using a single machine SMORE can perform multi-hop reasoning in Freebase KG (86M entities, 338M edges), which is 1,500x larger than previously considered KGs.
SMORE increases throughput (i.e., training speed) over prior multi-hop KG frameworks by 2.2x with minimal GPU memory requirements.
arXiv Detail & Related papers (2021-10-28T05:02:33Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.