Related papers: Hybrid quantum programming with PennyLane Lightning on HPC platforms

Hybrid quantum programming with PennyLane Lightning on HPC platforms

URL: http://arxiv.org/abs/2403.02512v1
Date: Mon, 4 Mar 2024 22:01:03 GMT
Title: Hybrid quantum programming with PennyLane Lightning on HPC platforms
Authors: Ali Asadi, Amintor Dusko, Chae-Yeun Park, Vincent Michaud-Rioux, Isidor Schoch, Shuli Shu, Trevor Vincent, Lee James O'Riordan
Abstract summary: PennyLane's Lightning suite is a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of Lightning with backends supporting CPUs, as well as NVidia and AMD GPUs, and compare the results to other commonly used high-performance simulator packages, demonstrating where Lightning's implementations give performance leads. We show improved CPU performance by employing explicit SIMD intrinsics and multi-threading, batched task-based execution across multiple GPUs, and distributed forward and gradient-based quantum circuit executions across multiple nodes. Our data shows we can comfortably simulate a variety of circuits, giving examples with up to 30 qubits on a single device or node, and up to 41 qubits using multiple nodes.

Related papers

Multi-GPU Quantum Circuit Simulation and the Impact of Network Performance [0.7340017786387767]
We present the introduction of MPI into the QED-C Application-Oriented Benchmarks to facilitate benchmarking on HPC systems.<n>We benchmark using a variety of interconnect paths, including the recent NVIDIA Grace Blackwell NVL72 architecture.<n>We show that while improvements to GPU architecture have led to speedups of over 4.5X, advances in interconnect performance have had a larger impact with over 16X performance improvements in time to solution.
arXiv Detail & Related papers (2025-11-18T17:04:28Z)
Towards Closing the Performance Gap for Cryptographic Kernels Between CPUs and Specialized Hardware [0.07646713951724009]
We develop an optimized implementation of cryptographic kernels for x86 CPUs at the per-core level.<n>We propose a small AVX-512 extension, dubbed multi-word extension (MQX)<n>MQX cuts the slowdown relative to ASICs to as low as 35 times on a single CPU core.
arXiv Detail & Related papers (2025-09-15T22:35:00Z)
Q-GEAR: Improving quantum simulation framework [0.28402080392117757]
We introduce Q-Gear, a software framework that transforms Qiskit quantum circuits into Cuda-Q kernels. Q-Gear accelerates both CPU and GPU based simulations by respectively two orders of magnitude and ten times with minimal coding effort.
arXiv Detail & Related papers (2025-04-04T22:17:51Z)
Versatile Cross-platform Compilation Toolchain for Schrödinger-style Quantum Circuit Simulation [15.448800194552705]
We propose CAST (Cross-platform Adaptive Schr"odiner-style Simulation Toolchain), a novel compilation toolchain with cross-platform ( CPU and Nvidia GPU) optimization and high-performance backend supports. Cast exploits a novel sparsity-aware gate fusion algorithm that automatically selects the best fusion strategy and backend configuration for targeted hardware platforms. We benchmark CAST against IBM Qiskit, Google QSimCirq, Nvidia cuQuantum backend, and other high-performance simulators.
arXiv Detail & Related papers (2025-03-25T17:53:59Z)
Multi-GPU RI-HF Energies and Analytic Gradients $-$ Towards High Throughput Ab Initio Molecular Dynamics [0.0]
This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock energies and analytic gradients using multiple Graphics Processing Units (GPUs) The algorithm is especially designed for high throughput emphab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms)
arXiv Detail & Related papers (2024-07-29T00:14:10Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
QCLAB++: Simulating Quantum Circuits on GPUs [0.0]
We introduce qclab++, a light-weight, fully-templated C++ package for GPU-accelerated quantum circuit simulations. qclab++ is designed for performance and numerical stability through highly optimized gate simulation algorithms. We also introduce qclab, a quantum circuit toolbox for Matlab with a syntax that mimics qclab++.
arXiv Detail & Related papers (2023-02-28T22:56:48Z)
Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models [53.937052213390736]
We introduce C++-based parallel constructs to enable parallel execution of a quantum kernel. Preliminary performance results show that running two Bell kernels with 12 threads per kernel in parallel outperforms running the kernels one after the other.
arXiv Detail & Related papers (2023-01-27T06:48:37Z)
Performance Evaluation and Acceleration of the QTensor Quantum Circuit Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU. Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z)
PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning. However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware. PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
Fast quantum circuit simulation using hardware accelerated general purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits. For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z)
Quantum Fan-out: Circuit Optimizations and Technology Modeling [3.4827330067784295]
We introduce a simultaneous fan-out primitive to optimize circuit synthesis for NISQ workloads. We also introduce novel quantum memory architectures based on fan-out. We demonstrate experimental proof-of-concept of fan-out with superconducting qubits.
arXiv Detail & Related papers (2020-07-08T16:38:07Z)
Intel Quantum Simulator: A cloud-ready high-performance simulator of quantum circuits [0.0]
We introduce the latest release of Intel Quantum Simulator (IQS), formerly known as qHiPSTER. The high-performance computing capability of the software allows users to leverage the available hardware resources. IQS allows to subdivide the computational resources to simulate a pool of related circuits in parallel.
arXiv Detail & Related papers (2020-01-28T19:00:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.