Performance Evaluation and Acceleration of the QTensor Quantum Circuit
Simulator on GPUs
- URL: http://arxiv.org/abs/2204.06045v1
- Date: Tue, 12 Apr 2022 19:03:44 GMT
- Title: Performance Evaluation and Acceleration of the QTensor Quantum Circuit
Simulator on GPUs
- Authors: Danylo Lykov, Angela Chen, Huaxuan Chen, Kristopher Keipert, Zheng
Zhang, Tom Gibbs, Yuri Alexeev
- Abstract summary: We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU.
Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
- Score: 6.141912076989479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work studies the porting and optimization of the tensor network
simulator QTensor on GPUs, with the ultimate goal of simulating quantum
circuits efficiently at scale on large GPU supercomputers. We implement NumPy,
PyTorch, and CuPy backends and benchmark the codes to find the optimal
allocation of tensor simulations to either a CPU or a GPU. We also present a
dynamic mixed backend to achieve optimal performance. To demonstrate the
performance, we simulate QAOA circuits for computing the MaxCut energy
expectation. Our method achieves $176\times$ speedup on a GPU over the NumPy
baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem on
a 3-regular graph of size 30 with depth $p=4$.
Related papers
- Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs [3.7101665559244874]
This paper presents a SYCL implementation of Multi-formedLayer Perceptrons (MLPs) for the Intel Data Center GPU Max 1550.
We show with a simple model that this results in a significant increase in arithmetic intensity, leading to improved performance, especially for inference.
arXiv Detail & Related papers (2024-03-26T11:38:39Z) - Hybrid quantum programming with PennyLane Lightning on HPC platforms [0.0]
PennyLane's Lightning suite is a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads.
Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures.
arXiv Detail & Related papers (2024-03-04T22:01:03Z) - Fast Simulation of High-Depth QAOA Circuits [10.778538580079365]
We present a simulator for the Quantum Approximate Optimization Algorithm (QAOA)
Our simulator is designed with the goal of reducing the computational cost of QAOA parameter optimization.
We reduce the time for a typical QAOA parameter optimization by eleven times for $n = 26$ qubits compared to a state-of-the-art GPU quantum circuit simulator based on cuQuantum.
arXiv Detail & Related papers (2023-09-09T17:01:29Z) - QCLAB++: Simulating Quantum Circuits on GPUs [0.0]
We introduce qclab++, a light-weight, fully-templated C++ package for GPU-accelerated quantum circuit simulations.
qclab++ is designed for performance and numerical stability through highly optimized gate simulation algorithms.
We also introduce qclab, a quantum circuit toolbox for Matlab with a syntax that mimics qclab++.
arXiv Detail & Related papers (2023-02-28T22:56:48Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - TensorLy-Quantum: Quantum Machine Learning with Tensor Methods [67.29221827422164]
We create a Python library for quantum circuit simulation that adopts the PyTorch API.
Ly-Quantum can scale to hundreds of qubits on a single GPU and thousands of qubits on multiple GPU.
arXiv Detail & Related papers (2021-12-19T19:26:17Z) - Simulation of quantum many-body dynamics with Tensor Processing Units:
Floquet prethermalization [0.3078264203938486]
We demonstrate the usage of TPUs for massively parallel, classical simulations of quantum many-body dynamics on long timescales.
We simulate the dynamics of L=34 qubits for over $105$ Floquet periods, corresponding to circuits with millions of two-qubit gates.
Our work demonstrates that TPUs can offer significant advantages for state-of-the-art simulations of quantum many-body dynamics.
arXiv Detail & Related papers (2021-11-15T19:02:54Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - Fast quantum circuit simulation using hardware accelerated general
purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits.
For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.