Massively Parallel Tensor Network State Algorithms on Hybrid CPU-GPU
Based Architectures
- URL: http://arxiv.org/abs/2305.05581v1
- Date: Tue, 9 May 2023 16:15:07 GMT
- Title: Massively Parallel Tensor Network State Algorithms on Hybrid CPU-GPU
Based Architectures
- Authors: Andor Menczer and \"Ors Legeza
- Abstract summary: We present novel algorithmic solutions together with implementation details to extend current limits of TNS algorithms on HPC infrastructure building.
Benchmark results are presented for selected strongly correlated molecular systems addressing problems on Hilbert space dimensions up to $2.88times1036$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The interplay of quantum and classical simulation and the delicate divide
between them is in the focus of massively parallelized tensor network state
(TNS) algorithms designed for high performance computing (HPC). In this
contribution, we present novel algorithmic solutions together with
implementation details to extend current limits of TNS algorithms on HPC
infrastructure building on state-of-the-art hardware and software technologies.
Benchmark results obtained via large-scale density matrix renormalization group
(DMRG) simulations are presented for selected strongly correlated molecular
systems addressing problems on Hilbert space dimensions up to
$2.88\times10^{36}$.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data.
Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy.
This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Simulator Demonstration of Large Scale Variational Quantum Algorithm on HPC Cluster [0.0]
This study aims to accelerate quantum simulation using two newly proposed methods.
We achieved 200 times higher speed over VQE simulations and demonstrated 32 qubits ground-state energy calculations in acceptable time.
arXiv Detail & Related papers (2024-02-19T06:34:01Z) - State of practice: evaluating GPU performance of state vector and tensor
network methods [2.7930955543692817]
This article investigates the limits of current state-of-the-art simulation techniques on a test bench made of eight widely used quantum subroutines.
We highlight how to select the best simulation strategy, obtaining a speedup of up to an order of magnitude.
arXiv Detail & Related papers (2024-01-11T09:22:21Z) - Two dimensional quantum lattice models via mode optimized hybrid CPU-GPU density matrix renormalization group method [0.0]
We present a hybrid numerical approach to simulate quantum many body problems on two spatial dimensional quantum lattice models.
We demonstrate for the two dimensional spinless fermion model and for the Hubbard model on torus geometry that several orders of magnitude in computational time can be saved.
arXiv Detail & Related papers (2023-11-23T17:07:47Z) - Boosting the effective performance of massively parallel tensor network
state algorithms on hybrid CPU-GPU based architectures via non-Abelian
symmetries [0.0]
Non-Abelian symmetry related tensor algebra based on Wigner-Eckhart theorem is fully detached from the conventional tensor network layer.
We have achieved an order of magnitude increase in performance with respect to results reported in arXiv:2305.05581 in terms of computational complexity.
Our solution has an estimated effective performance of 250-500 TFLOPS.
arXiv Detail & Related papers (2023-09-23T07:49:53Z) - Quantum Annealing for Single Image Super-Resolution [86.69338893753886]
We propose a quantum computing-based algorithm to solve the single image super-resolution (SISR) problem.
The proposed AQC-based algorithm is demonstrated to achieve improved speed-up over a classical analog while maintaining comparable SISR accuracy.
arXiv Detail & Related papers (2023-04-18T11:57:15Z) - Decomposition of Matrix Product States into Shallow Quantum Circuits [62.5210028594015]
tensor network (TN) algorithms can be mapped to parametrized quantum circuits (PQCs)
We propose a new protocol for approximating TN states using realistic quantum circuits.
Our results reveal one particular protocol, involving sequential growth and optimization of the quantum circuit, to outperform all other methods.
arXiv Detail & Related papers (2022-09-01T17:08:41Z) - Learning to Beamform in Heterogeneous Massive MIMO Networks [48.62625893368218]
It is well-known problem of finding the optimal beamformers in massive multiple-input multiple-output (MIMO) networks.
We propose a novel deep learning based paper algorithm to address this problem.
arXiv Detail & Related papers (2020-11-08T12:48:06Z) - Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11.
A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.