Simulation of quantum many-body dynamics with Tensor Processing Units:
Floquet prethermalization
- URL: http://arxiv.org/abs/2111.08044v2
- Date: Fri, 11 Feb 2022 20:23:53 GMT
- Title: Simulation of quantum many-body dynamics with Tensor Processing Units:
Floquet prethermalization
- Authors: Alan Morningstar, Markus Hauru, Jackson Beall, Martin Ganahl, Adam G.
M. Lewis, Vedika Khemani, and Guifre Vidal
- Abstract summary: We demonstrate the usage of TPUs for massively parallel, classical simulations of quantum many-body dynamics on long timescales.
We simulate the dynamics of L=34 qubits for over $105$ Floquet periods, corresponding to circuits with millions of two-qubit gates.
Our work demonstrates that TPUs can offer significant advantages for state-of-the-art simulations of quantum many-body dynamics.
- Score: 0.3078264203938486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tensor Processing Units (TPUs) are specialized hardware accelerators
developed by Google to support large-scale machine-learning tasks, but they can
also be leveraged to accelerate and scale other linear-algebra-intensive
computations. In this paper we demonstrate the usage of TPUs for massively
parallel, classical simulations of quantum many-body dynamics on long
timescales. We apply our methods to study the phenomenon of Floquet
prethermalization, i.e., exponentially slow heating in quantum spin chains
subject to high-frequency periodic driving. We simulate the dynamics of L=34
qubits for over $10^5$ Floquet periods, corresponding to circuits with millions
of two-qubit gates. The circuits simulated have no additional symmetries and
represent a pure-state evolution in the full $2^L$-dimensional Hilbert space.
This is achieved by distributing the computation over 128 TPU cores. On that
size TPU cluster, we find speedups in wall-clock runtime of 230x and 15x when
compared to reference CPU and single-GPU simulations, respectively, for shorter
30-qubit simulations that can be handled by all three platforms. We study the
computational cost of the simulations, as a function of both the number of
qubits and the number of TPU cores used, up to our maximum capacity of L=40
qubits, which requires a ``full pod" of 2048 TPU cores with tens of terabytes
of memory in total. For these simulations, an 8-TPU-core machine is comparable
to a single A100 GPU, and thus the full TPU pod is comparable to a machine with
hundreds of GPUs. However, the TPU pod is more energy and cost efficient, and
readily accessible (via Google Cloud), unlike such large many-GPU
configurations. We also study the accumulation of numerical error as a function
of circuit depth in very deep circuits. Our work demonstrates that TPUs can
offer significant advantages for state-of-the-art simulations of quantum
many-body dynamics.
Related papers
- Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs [11.87665112550076]
In quantum hardware, primary simulation methods are based on state vectors and tensor networks.
As the number of qubits and quantum gates grows larger, traditional state-vector based quantum circuit simulation methods prove inadequate due to the overwhelming size of the Hilbert space and extensive entanglement.
In this study, we propose general optimization strategies from two aspects: computational efficiency and accuracy.
arXiv Detail & Related papers (2023-10-06T02:24:05Z) - Efficient techniques to GPU Accelerations of Multi-Shot Quantum
Computing Simulations [0.0]
Current quantum computers are limited because of computer resources, hardware limits, instability, and noises.
Improving quantum computing simulation performance in classical computers will contribute to the development of quantum computers and their algorithms.
arXiv Detail & Related papers (2023-08-07T08:32:36Z) - Tricking AI chips into Simulating the Human Brain: A Detailed
Performance Analysis [0.5354801701968198]
We evaluate multiple, cutting-edge AI-chips (Graphcore IPU, GroqChip, Nvidia GPU with inferior Cores and Google TPU) for brain simulation.
Our performance analysis reveals that the simulation problem maps extremely well onto the GPU and TPU architectures.
The GroqChip outperforms both platforms for small networks but, due to implementing some floating-point operations at reduced accuracy, is found not yet usable for brain simulation.
arXiv Detail & Related papers (2023-01-31T13:51:37Z) - Performance Evaluation and Acceleration of the QTensor Quantum Circuit
Simulator on GPUs [6.141912076989479]
We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU.
Our method achieves $176times$ speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem.
arXiv Detail & Related papers (2022-04-12T19:03:44Z) - TensorLy-Quantum: Quantum Machine Learning with Tensor Methods [67.29221827422164]
We create a Python library for quantum circuit simulation that adopts the PyTorch API.
Ly-Quantum can scale to hundreds of qubits on a single GPU and thousands of qubits on multiple GPU.
arXiv Detail & Related papers (2021-12-19T19:26:17Z) - Simulation of quantum physics with Tensor Processing Units: brute-force
computation of ground states and time evolution [0.3232625980782302]
Processing Units (TPUs) were developed by Google exclusively to support large-scale machine learning tasks.
In this paper we repurpose TPUs for the challenging problem of simulating quantum spin systems.
With a TPU v3 pod, with 2048 cores, we simulate wavefunctions $|Psirangle$ of up to $N=38$ qubits.
arXiv Detail & Related papers (2021-11-19T22:41:04Z) - Parallel Simulation of Quantum Networks with Distributed Quantum State
Management [56.24769206561207]
We identify requirements for parallel simulation of quantum networks and develop the first parallel discrete event quantum network simulator.
Our contributions include the design and development of a quantum state manager that maintains shared quantum information distributed across multiple processes.
We release the parallel SeQUeNCe simulator as an open-source tool alongside the existing sequential version.
arXiv Detail & Related papers (2021-11-06T16:51:17Z) - Fast quantum circuit simulation using hardware accelerated general
purpose libraries [69.43216268165402]
CuPy is a general purpose library (linear algebra) developed specifically for GPU-based quantum circuits.
For supremacy circuits the speedup is around 2x, and for quantum multipliers almost 22x compared to state-of-the-art C++-based simulators.
arXiv Detail & Related papers (2021-06-26T10:41:43Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.