Related papers: Simulation of quantum many-body dynamics with Tensor Processing Units: Floquet prethermalization

Simulation of quantum many-body dynamics with Tensor Processing Units: Floquet prethermalization

URL: http://arxiv.org/abs/2111.08044v2
Date: Fri, 11 Feb 2022 20:23:53 GMT
Title: Simulation of quantum many-body dynamics with Tensor Processing Units: Floquet prethermalization
Authors: Alan Morningstar, Markus Hauru, Jackson Beall, Martin Ganahl, Adam G. M. Lewis, Vedika Khemani, and Guifre Vidal
Abstract summary: We demonstrate the usage of TPUs for massively parallel, classical simulations of quantum many-body dynamics on long timescales. We simulate the dynamics of L=34 qubits for over $105$ Floquet periods, corresponding to circuits with millions of two-qubit gates. Our work demonstrates that TPUs can offer significant advantages for state-of-the-art simulations of quantum many-body dynamics.
Score: 0.3078264203938486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tensor Processing Units (TPUs) are specialized hardware accelerators developed by Google to support large-scale machine-learning tasks, but they can also be leveraged to accelerate and scale other linear-algebra-intensive computations. In this paper we demonstrate the usage of TPUs for massively parallel, classical simulations of quantum many-body dynamics on long timescales. We apply our methods to study the phenomenon of Floquet prethermalization, i.e., exponentially slow heating in quantum spin chains subject to high-frequency periodic driving. We simulate the dynamics of L=34 qubits for over $10^5$ Floquet periods, corresponding to circuits with millions of two-qubit gates. The circuits simulated have no additional symmetries and represent a pure-state evolution in the full $2^L$-dimensional Hilbert space. This is achieved by distributing the computation over 128 TPU cores. On that size TPU cluster, we find speedups in wall-clock runtime of 230x and 15x when compared to reference CPU and single-GPU simulations, respectively, for shorter 30-qubit simulations that can be handled by all three platforms. We study the computational cost of the simulations, as a function of both the number of qubits and the number of TPU cores used, up to our maximum capacity of L=40 qubits, which requires a ``full pod" of 2048 TPU cores with tens of terabytes of memory in total. For these simulations, an 8-TPU-core machine is comparable to a single A100 GPU, and thus the full TPU pod is comparable to a machine with hundreds of GPUs. However, the TPU pod is more energy and cost efficient, and readily accessible (via Google Cloud), unlike such large many-GPU configurations. We also study the accumulation of numerical error as a function of circuit depth in very deep circuits. Our work demonstrates that TPUs can offer significant advantages for state-of-the-art simulations of quantum many-body dynamics.

Related papers

phase2: Full-State Vector Simulation of Quantum Time Evolution at Scale [0.8223023312645978]
Large-scale classical simulation of quantum computers is crucial for benchmarking quantum algorithms. We present a full-state vector simulation algorithm and software implementation designed to perform HPC simulation of layers of rotations around a string of Pauli operators.
arXiv Detail & Related papers (2025-04-24T18:41:23Z)
Comparative Benchmarking of Utility-Scale Quantum Emulators [0.0]
evaluating quantum algorithms at utility-scale is a key step toward advancing real-world applications of quantum computing. We benchmark seven state-the-art quantum emulators employing techniques such as tensor networks, matrix product states (MPS), decision diagrams, and factorized ket based methods.
arXiv Detail & Related papers (2025-04-18T18:32:47Z)
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries [67.63077028746191]
Transolver++ is a highly parallel and efficient neural solver that can solve PDEs on million-scale geometries. Transolver++ increases the single- GPU input capacity to million-scale points for the first time. It achieves over 20% performance gain in million-scale high-fidelity industrial simulations.
arXiv Detail & Related papers (2025-02-04T15:33:50Z)
Harnessing CUDA-Q's MPS for Tensor Network Simulations of Large-Scale Quantum Circuits [0.0]
Current largest quantum computers feature more than one thousand qubits. A more appealing approach for simulating quantum computers is adopting the network approach. We show that network-based methods provide a significant opportunity to simulate large-qubit circuits.
arXiv Detail & Related papers (2025-01-27T10:36:05Z)
GPU-accelerated Effective Hamiltonian Calculator [70.12254823574538]
We present numerical techniques inspired by Nonperturbative Analytical Diagonalization (NPAD) and the Magnus expansion for the efficient calculation of effective Hamiltonians. Our numerical techniques are available as an open-source Python package, $rm qCH_eff$.
arXiv Detail & Related papers (2024-11-15T06:33:40Z)
Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs [11.87665112550076]
In quantum hardware, primary simulation methods are based on state vectors and tensor networks. As the number of qubits and quantum gates grows larger, traditional state-vector based quantum circuit simulation methods prove inadequate due to the overwhelming size of the Hilbert space and extensive entanglement. In this study, we propose general optimization strategies from two aspects: computational efficiency and accuracy.
arXiv Detail & Related papers (2023-10-06T02:24:05Z)
Efficient techniques to GPU Accelerations of Multi-Shot Quantum Computing Simulations [0.0]
Current quantum computers are limited because of computer resources, hardware limits, instability, and noises. Improving quantum computing simulation performance in classical computers will contribute to the development of quantum computers and their algorithms.
arXiv Detail & Related papers (2023-08-07T08:32:36Z)
Tricking AI chips into Simulating the Human Brain: A Detailed Performance Analysis [0.5354801701968198]
We evaluate multiple, cutting-edge AI-chips (Graphcore IPU, GroqChip, Nvidia GPU with inferior Cores and Google TPU) for brain simulation. Our performance analysis reveals that the simulation problem maps extremely well onto the GPU and TPU architectures. The GroqChip outperforms both platforms for small networks but, due to implementing some floating-point operations at reduced accuracy, is found not yet usable for brain simulation.
arXiv Detail & Related papers (2023-01-31T13:51:37Z)
Performance Evaluation and Acceleration of the QTensor Quantum Circuit Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU. Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z)
TensorLy-Quantum: Quantum Machine Learning with Tensor Methods [67.29221827422164]
We create a Python library for quantum circuit simulation that adopts the PyTorch API. Ly-Quantum can scale to hundreds of qubits on a single GPU and thousands of qubits on multiple GPU.
arXiv Detail & Related papers (2021-12-19T19:26:17Z)
Simulation of quantum physics with Tensor Processing Units: brute-force computation of ground states and time evolution [0.3232625980782302]
Processing Units (TPUs) were developed by Google exclusively to support large-scale machine learning tasks. In this paper we repurpose TPUs for the challenging problem of simulating quantum spin systems. With a TPU v3 pod, with 2048 cores, we simulate wavefunctions $|Psirangle$ of up to $N=38$ qubits.
arXiv Detail & Related papers (2021-11-19T22:41:04Z)
Parallel Simulation of Quantum Networks with Distributed Quantum State Management [56.24769206561207]
We identify requirements for parallel simulation of quantum networks and develop the first parallel discrete event quantum network simulator. Our contributions include the design and development of a quantum state manager that maintains shared quantum information distributed across multiple processes. We release the parallel SeQUeNCe simulator as an open-source tool alongside the existing sequential version.
arXiv Detail & Related papers (2021-11-06T16:51:17Z)
Fast quantum circuit simulation using hardware accelerated general purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits. For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z)
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters. It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions. We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z)
Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work. We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine. By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.