Related papers: GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states

GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states

URL: http://arxiv.org/abs/2406.08314v1
Date: Wed, 12 Jun 2024 15:15:17 GMT
Title: GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states
Authors: Yifei Huang, Zhen Guo, Hung Q. Pham, Dingshun Lv,
Abstract summary: We present an implementation and application of graphics processing unitaccelerated ph-AFQMC. Using multi-Slater trial states, ph-AFQMC has the potential faithfully treat strongly correlated systems. Our work significantly enhances the efficiency of MSDAFQMC calculations for large, strongly correlated molecules.
Score: 11.514211053741338
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The accuracy of phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) can be systematically improved with better trial states. Using multi-Slater determinant trial states, ph-AFQMC has the potential to faithfully treat strongly correlated systems, while balancing the static and dynamical correlations on an equal footing. This preprint presents an implementation and application of graphics processing unit-accelerated ph-AFQMC, for multi-Slater determinant trial wavefunctions (GPU-accelerated MSD-AFQMC), to enable efficient simulation of large-scale, strongly correlated systems. This approach allows for nearly-exact computation of ground state energies in multi-reference systems. Our GPU-accelerated MSD-AFQMC is implemented in the open-source code \texttt{ipie}, a Python-based AFQMC package [\textit{J. Chem. Theory Comput.}, 2022, 19(1): 109-121]. We benchmark the performance of the GPU code on transition-metal clusters like [Cu$_2$O$_2$]$^{2+}$ and [Fe$_2$S$_2$(SCH$_3$)]$^{2-}$. The GPU code achieves at least sixfold speedup in both cases, comparing the timings of a single A100 GPU to that of a 32-CPU node. For [Fe$_2$S$_2$(SCH$_3$)]$^{2-}$, we demonstrate that our GPU MSD-AFQMC can recover the dynamical correlation necessary for chemical accuracy with an MSD trial, despite the large number of determinants required ($>10^5$). Our work significantly enhances the efficiency of MSD-AFQMC calculations for large, strongly correlated molecules by utilizing GPUs, offering a promising path for exploring the electronic structure of transition metal complexes.

Related papers

ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels [40.94392896555992]
Existing systems mitigate this through compute-communication overlap but often fail to meet theoretical bandwidth across workloads and new accelerators.<n>Instead of operator-specific techniques, we ask whether a small set of simple, reusable principles can guide the optimal optimal performance of workloads.<n>PKKittens (PK) kernels achieves up to $2.33 times times parallel workloads.
arXiv Detail & Related papers (2025-11-17T21:48:33Z)
GPU-accelerated Effective Hamiltonian Calculator [70.12254823574538]
We present numerical techniques inspired by Nonperturbative Analytical Diagonalization (NPAD) and the Magnus expansion for the efficient calculation of effective Hamiltonians. Our numerical techniques are available as an open-source Python package, $rm qCH_eff$.
arXiv Detail & Related papers (2024-11-15T06:33:40Z)
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters [0.0]
opt-UM and opt-Brc introduce significant enhancements to Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$zeta$ basis sets, while opt-UM is advantageous for large molecular systems.
arXiv Detail & Related papers (2024-07-31T08:49:06Z)
Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework [4.368931200886271]
We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF. Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF.
arXiv Detail & Related papers (2024-07-12T21:50:19Z)
A distributed multi-GPU ab initio density matrix renormalization group algorithm with applications to the P-cluster of nitrogenase [1.7444066202370399]
We present the first distributed multi- GPU (Graphics Processing Unit) emphab initio density matrix renormalization (DMRG) algorithm. We are able to reach an unprecedentedly large bond dimension $D=14000$ on 48 GPU. This is nearly three times larger than the bond dimensions reported in previous DMRG calculations for the same system using only CPUs.
arXiv Detail & Related papers (2023-11-06T04:01:26Z)
On sampling determinantal and Pfaffian point processes on a quantum computer [49.1574468325115]
DPPs were introduced by Macchi as a model in quantum optics the 1970s. Most applications require sampling from a DPP, and given their quantum origin, it is natural to wonder whether sampling a DPP on a classical computer is easier than on a classical one. Vanilla sampling consists in two steps, of respective costs $mathcalO(N3)$ and $mathcalO(Nr2)$ operations on a classical computer, where $r$ is the rank of the kernel matrix.
arXiv Detail & Related papers (2023-05-25T08:43:11Z)
ipie: A Python-based Auxiliary-Field Quantum Monte Carlo Program with Flexibility and Efficiency on CPUs and GPUs [0.5735035463793008]
We report the development of a python-based auxiliary-field quantum Monte Carlo program, ipie, with preliminary timing benchmarks and new AFQMC results. We demonstrate how implementations for both central and graphical processing units are achieved in ipie.
arXiv Detail & Related papers (2022-09-08T19:50:53Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning. Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system. The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z)
Performance Evaluation and Acceleration of the QTensor Quantum Circuit Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU. Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.