Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs (Extended Version)
- URL: http://arxiv.org/abs/2408.09055v2
- Date: Mon, 4 Nov 2024 21:59:24 GMT
- Title: Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs (Extended Version)
- Authors: Mingkuan Xu, Shiyi Cao, Xupeng Miao, Umut A. Acar, Zhihao Jia,
- Abstract summary: This paper presents techniques for theoretically and practically efficient and scalable quantum circuit simulation.
Our approach partitions a quantum circuit into a hierarchy of subcircuits and simulates subcircuits on multi-node GPU.
To minimize communication costs, we formulate an Linear Program that rewards simulation of "nearby" gates on "nearby"
To maximize throughput, we use a dynamic programming algorithm to compute the subcircuit simulated by each kernel at a GPU.
- Score: 9.483321080040131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents techniques for theoretically and practically efficient and scalable Schr\"odinger-style quantum circuit simulation. Our approach partitions a quantum circuit into a hierarchy of subcircuits and simulates the subcircuits on multi-node GPUs, exploiting available data parallelism while minimizing communication costs. To minimize communication costs, we formulate an Integer Linear Program that rewards simulation of "nearby" gates on "nearby" GPUs. To maximize throughput, we use a dynamic programming algorithm to compute the subcircuit simulated by each kernel at a GPU. We realize these techniques in Atlas, a distributed, multi-GPU quantum circuit simulator. Our evaluation on a variety of quantum circuits shows that Atlas outperforms state-of-the-art GPU-based simulators by more than 2$\times$ on average and is able to run larger circuits via offloading to DRAM, outperforming other large-circuit simulators by two orders of magnitude.
Related papers
- Circuit Partitioning and Full Circuit Execution: A Comparative Study of GPU-Based Quantum Circuit Simulation [0.0]
Executing large quantum circuits is not feasible using the currently available NISQ (noisy intermediate-scale quantum) devices.
This study presents a comparative analysis of two simulation methods: circuit-splitting and full-circuit execution using distributed memory.
Results indicate that full-circuit executions are faster than circuit-splitting for simulations performed on a single node.
arXiv Detail & Related papers (2025-02-17T03:04:43Z) - AMARETTO: Enabling Efficient Quantum Algorithm Emulation on Low-Tier FPGAs [0.6553587309274792]
AMARETTO is designed for quantum computing emulation on low-tier Field-Programmable gate arrays (FPGAs)
It simplifies and accelerates the verification of quantum algorithms using a Reduced-Instruction-Set-Computer (RISC)-like structure and efficient handling of sparse quantum gates.
arXiv Detail & Related papers (2024-11-14T10:01:53Z) - Parallel Quantum Computing Simulations via Quantum Accelerator Platform Virtualization [44.99833362998488]
We present a model for parallelizing simulation of quantum circuit executions.
The model can take advantage of its backend-agnostic features, enabling parallel quantum circuit execution over any target backend.
arXiv Detail & Related papers (2024-06-05T17:16:07Z) - Efficient techniques to GPU Accelerations of Multi-Shot Quantum
Computing Simulations [0.0]
Current quantum computers are limited because of computer resources, hardware limits, instability, and noises.
Improving quantum computing simulation performance in classical computers will contribute to the development of quantum computers and their algorithms.
arXiv Detail & Related papers (2023-08-07T08:32:36Z) - On sampling determinantal and Pfaffian point processes on a quantum
computer [49.1574468325115]
DPPs were introduced by Macchi as a model in quantum optics the 1970s.
Most applications require sampling from a DPP, and given their quantum origin, it is natural to wonder whether sampling a DPP on a classical computer is easier than on a classical one.
Vanilla sampling consists in two steps, of respective costs $mathcalO(N3)$ and $mathcalO(Nr2)$ operations on a classical computer, where $r$ is the rank of the kernel matrix.
arXiv Detail & Related papers (2023-05-25T08:43:11Z) - QCLAB++: Simulating Quantum Circuits on GPUs [0.0]
We introduce qclab++, a light-weight, fully-templated C++ package for GPU-accelerated quantum circuit simulations.
qclab++ is designed for performance and numerical stability through highly optimized gate simulation algorithms.
We also introduce qclab, a quantum circuit toolbox for Matlab with a syntax that mimics qclab++.
arXiv Detail & Related papers (2023-02-28T22:56:48Z) - TensorLy-Quantum: Quantum Machine Learning with Tensor Methods [67.29221827422164]
We create a Python library for quantum circuit simulation that adopts the PyTorch API.
Ly-Quantum can scale to hundreds of qubits on a single GPU and thousands of qubits on multiple GPU.
arXiv Detail & Related papers (2021-12-19T19:26:17Z) - Fast quantum circuit simulation using hardware accelerated general
purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits.
For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z) - Faster Schr\"odinger-style simulation of quantum circuits [2.0940228639403156]
Recent demonstrations of superconducting quantum computers by Google and IBM fueled new research in quantum algorithms.
We advance Schr"odinger-style simulation of quantum circuits that is useful standalone and as a building block in layered simulation algorithms.
arXiv Detail & Related papers (2020-08-01T08:47:24Z) - Machine Learning Optimization of Quantum Circuit Layouts [63.55764634492974]
We introduce a quantum circuit mapping, QXX, and its machine learning version, QXX-MLP.
The latter infers automatically the optimal QXX parameter values such that the layed out circuit has a reduced depth.
We present empiric evidence for the feasibility of learning the layout method using approximation.
arXiv Detail & Related papers (2020-07-29T05:26:19Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.