Related papers: Benchmarking Quantum Red TEA on CPUs, GPUs, and TPUs

Benchmarking Quantum Red TEA on CPUs, GPUs, and TPUs

URL: http://arxiv.org/abs/2409.03818v1
Date: Thu, 5 Sep 2024 18:00:01 GMT
Title: Benchmarking Quantum Red TEA on CPUs, GPUs, and TPUs
Authors: Daniel Jaschke, Marco Ballarin, Nora Reinić, Luka Pavešić, Simone Montangero,
Abstract summary: We compare different linear algebra backends, e.g., numpy versus the torch, jax, or tensorflow library, as well as a mixed-precision-inspired approach and optimizations for the target hardware. We present a way to obtain speedups of a factor of 34 when tuning parameters on the CPU, and an additional factor of 2.76 on top of the best CPU setup when migrating to GPUs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We benchmark simulations of many-body quantum systems on heterogeneous hardware platforms using CPUs, GPUs, and TPUs. We compare different linear algebra backends, e.g., numpy versus the torch, jax, or tensorflow library, as well as a mixed-precision-inspired approach and optimizations for the target hardware. Quantum red TEA out of the Quantum TEA library specifically addresses handling tensors with different libraries or hardware, where the tensors are the building block of tensor network algorithms. The benchmark problem is a variational search of a ground state in an interacting model. This is a ubiquitous problem in quantum many-body physics, which we solve using tensor network methods. This approximate state-of-the-art method compresses quantum correlations which is key to overcoming the exponential growth of the Hilbert space as a function of the number of particles. We present a way to obtain speedups of a factor of 34 when tuning parameters on the CPU, and an additional factor of 2.76 on top of the best CPU setup when migrating to GPUs.

Related papers

TensorQC: Towards Scalable Distributed Quantum Computing via Tensor Networks [16.609478015737707]
A quantum processing unit (QPU) must contain a large number of high quality qubits to produce accurate results. Most scientific and industry classical computation workloads happen in parallel on distributed systems. This paper demonstrates running benchmarks that are otherwise intractable for a standalone QPU and prior circuit cutting techniques.
arXiv Detail & Related papers (2025-02-05T18:42:07Z)
GPU-accelerated Effective Hamiltonian Calculator [70.12254823574538]
We present numerical techniques inspired by Nonperturbative Analytical Diagonalization (NPAD) and the Magnus expansion for the efficient calculation of effective Hamiltonians. Our numerical techniques are available as an open-source Python package, $rm qCH_eff$.
arXiv Detail & Related papers (2024-11-15T06:33:40Z)
3D-QAE: Fully Quantum Auto-Encoding of 3D Point Clouds [71.39129855825402]
Existing methods for learning 3D representations are deep neural networks trained and tested on classical hardware. This paper introduces the first quantum auto-encoder for 3D point clouds.
arXiv Detail & Related papers (2023-11-09T18:58:33Z)
Majorization-based benchmark of the complexity of quantum processors [105.54048699217668]
We numerically simulate and characterize the operation of various quantum processors. We identify and assess quantum complexity by comparing the performance of each device against benchmark lines. We find that the majorization-based benchmark holds as long as the circuits' output states have, on average, high purity.
arXiv Detail & Related papers (2023-04-10T23:01:10Z)
TeD-Q: a tensor network enhanced distributed hybrid quantum machine learning framework [59.07246314484875]
TeD-Q is an open-source software framework for quantum machine learning. It seamlessly integrates classical machine learning libraries with quantum simulators. It provides a graphical mode in which the quantum circuit and the training progress can be visualized in real-time.
arXiv Detail & Related papers (2023-01-13T09:35:05Z)
Performance Evaluation and Acceleration of the QTensor Quantum Circuit Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU. Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z)
RosneT: A Block Tensor Algebra Library for Out-of-Core Quantum Computing Simulation [0.18472148461613155]
We present RosneT, a library for distributed, out-of-core block tensor algebra. We use the PyCOMPSs programming model to transform tensor operations into a collection of tasks handled by the COMPSs runtime. We report results validating our approach showing good scalability in simulations of Quantum circuits of up to 53 qubits.
arXiv Detail & Related papers (2022-01-17T20:35:40Z)
TensorLy-Quantum: Quantum Machine Learning with Tensor Methods [67.29221827422164]
We create a Python library for quantum circuit simulation that adopts the PyTorch API. Ly-Quantum can scale to hundreds of qubits on a single GPU and thousands of qubits on multiple GPU.
arXiv Detail & Related papers (2021-12-19T19:26:17Z)
Simulation of quantum many-body dynamics with Tensor Processing Units: Floquet prethermalization [0.3078264203938486]
We demonstrate the usage of TPUs for massively parallel, classical simulations of quantum many-body dynamics on long timescales. We simulate the dynamics of L=34 qubits for over $105$ Floquet periods, corresponding to circuits with millions of two-qubit gates. Our work demonstrates that TPUs can offer significant advantages for state-of-the-art simulations of quantum many-body dynamics.
arXiv Detail & Related papers (2021-11-15T19:02:54Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
Fast quantum circuit simulation using hardware accelerated general purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits. For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z)
Accelerating variational quantum algorithms with multiple quantum processors [78.36566711543476]
Variational quantum algorithms (VQAs) have the potential of utilizing near-term quantum machines to gain certain computational advantages. Modern VQAs suffer from cumbersome computational overhead, hampered by the tradition of employing a solitary quantum processor to handle large data. Here we devise an efficient distributed optimization scheme, called QUDIO, to address this issue.
arXiv Detail & Related papers (2021-06-24T08:18:42Z)
Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z)
Kernel Operations on the GPU, with Autodiff, without Memory Overflows [5.669790037378094]
The KeOps library provides a fast and memory-efficient GPU support for tensors whose entries are given by a mathematical formula. KeOps alleviates the major bottleneck of tensor-centric libraries for kernel and geometric applications: memory consumption. KeOps combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and PyTorch), Matlab and R.
arXiv Detail & Related papers (2020-03-27T08:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.