Hybrid quantum programming with PennyLane Lightning on HPC platforms
- URL: http://arxiv.org/abs/2403.02512v1
- Date: Mon, 4 Mar 2024 22:01:03 GMT
- Title: Hybrid quantum programming with PennyLane Lightning on HPC platforms
- Authors: Ali Asadi, Amintor Dusko, Chae-Yeun Park, Vincent Michaud-Rioux,
Isidor Schoch, Shuli Shu, Trevor Vincent, Lee James O'Riordan
- Abstract summary: PennyLane's Lightning suite is a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads.
Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce PennyLane's Lightning suite, a collection of high-performance
state-vector simulators targeting CPU, GPU, and HPC-native architectures and
workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are
implemented to demonstrate the supported classical computing architectures and
showcase the scale of problems that can be simulated using our tooling. We
benchmark the performance of Lightning with backends supporting CPUs, as well
as NVidia and AMD GPUs, and compare the results to other commonly used
high-performance simulator packages, demonstrating where Lightning's
implementations give performance leads. We show improved CPU performance by
employing explicit SIMD intrinsics and multi-threading, batched task-based
execution across multiple GPUs, and distributed forward and gradient-based
quantum circuit executions across multiple nodes. Our data shows we can
comfortably simulate a variety of circuits, giving examples with up to 30
qubits on a single device or node, and up to 41 qubits using multiple nodes.
Related papers
- Multi-GPU RI-HF Energies and Analytic Gradients $-$ Towards High Throughput Ab Initio Molecular Dynamics [0.0]
This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock energies and analytic gradients using multiple Graphics Processing Units (GPUs)
The algorithm is especially designed for high throughput emphab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms)
arXiv Detail & Related papers (2024-07-29T00:14:10Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - QCLAB++: Simulating Quantum Circuits on GPUs [0.0]
We introduce qclab++, a light-weight, fully-templated C++ package for GPU-accelerated quantum circuit simulations.
qclab++ is designed for performance and numerical stability through highly optimized gate simulation algorithms.
We also introduce qclab, a quantum circuit toolbox for Matlab with a syntax that mimics qclab++.
arXiv Detail & Related papers (2023-02-28T22:56:48Z) - Enabling Multi-threading in Heterogeneous Quantum-Classical Programming
Models [53.937052213390736]
We introduce C++-based parallel constructs to enable parallel execution of a quantum kernel.
Preliminary performance results show that running two Bell kernels with 12 threads per kernel in parallel outperforms running the kernels one after the other.
arXiv Detail & Related papers (2023-01-27T06:48:37Z) - Performance Evaluation and Acceleration of the QTensor Quantum Circuit
Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU.
Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Fast quantum circuit simulation using hardware accelerated general
purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits.
For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z) - Quantum Fan-out: Circuit Optimizations and Technology Modeling [3.4827330067784295]
We introduce a simultaneous fan-out primitive to optimize circuit synthesis for NISQ workloads.
We also introduce novel quantum memory architectures based on fan-out.
We demonstrate experimental proof-of-concept of fan-out with superconducting qubits.
arXiv Detail & Related papers (2020-07-08T16:38:07Z) - Intel Quantum Simulator: A cloud-ready high-performance simulator of
quantum circuits [0.0]
We introduce the latest release of Intel Quantum Simulator (IQS), formerly known as qHiPSTER.
The high-performance computing capability of the software allows users to leverage the available hardware resources.
IQS allows to subdivide the computational resources to simulate a pool of related circuits in parallel.
arXiv Detail & Related papers (2020-01-28T19:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.