Many-body computing on Field Programmable Gate Arrays
- URL: http://arxiv.org/abs/2402.06415v2
- Date: Sat, 25 Jan 2025 13:40:00 GMT
- Title: Many-body computing on Field Programmable Gate Arrays
- Authors: Songtai Lv, Yang Liang, Yuchen Meng, Xiaochen Yao, Jincheng Xu, Yang Liu, Qibin Zheng, Haiyuan Zou,
- Abstract summary: We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations.
This has resulted in a tenfold speedup compared to CPU-based computation for a Monte Carlo algorithm.
For the first time, the utilization of FPGA to accelerate a typical tensor network algorithm for many-body ground state calculations.
- Score: 5.3808713424582395
- License:
- Abstract: A new implementation of many-body calculations is of paramount importance in the field of computational physics. In this study, we leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations. Through the design of appropriate schemes for Monte Carlo and tensor network methods, we effectively utilize the parallel processing capabilities provided by FPGAs. This has resulted in a tenfold speedup compared to CPU-based computation for a Monte Carlo algorithm. By using a supercell structure and simulating the FPGA architecture on a CPU with High-Level Synthesis, we achieve $O(1)$ scaling for the time of one sweep, regardless of the overall system size. We also demonstrate, for the first time, the utilization of FPGA to accelerate a typical tensor network algorithm for many-body ground state calculations. Additionally, we show that the current FPGA computing acceleration is on par with that of multi-threaded GPU parallel processing. Our findings unambiguously highlight the significant advantages of hardware implementation and pave the way for novel approaches to many-body calculations.
Related papers
- Design of an FPGA-Based Neutral Atom Rearrangement Accelerator for Quantum Computing [1.003635085077511]
Neutral atoms have emerged as a promising technology for implementing quantum computers.
We propose a novel quadrant-based rearrangement algorithm that employs a divide-and-conquer strategy and also enables the simultaneous movement of multiple atoms.
This is the first hardware acceleration work for atom rearrangement, and it significantly reduces the processing time.
arXiv Detail & Related papers (2024-11-19T10:38:21Z) - All-to-all reconfigurability with sparse and higher-order Ising machines [0.0]
We introduce a multiplexed architecture that emulates all-to-all network functionality.
We show that running the adaptive parallel tempering algorithm demonstrates competitive algorithmic and prefactor advantages.
scaled magnetic versions of p-bit IMs could lead to orders of magnitude improvements over the state of the art for generic optimization.
arXiv Detail & Related papers (2023-11-21T20:27:02Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - An FPGA-based Solution for Convolution Operation Acceleration [0.0]
This paper proposes an FPGA-based architecture to accelerate the convolution operation.
The project's purpose is to produce an FPGA IP core that can process a convolutional layer at a time.
arXiv Detail & Related papers (2022-06-09T14:12:30Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - Accelerated Charged Particle Tracking with Graph Neural Networks on
FPGAs [0.0]
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks.
We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing.
arXiv Detail & Related papers (2020-11-30T18:17:43Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11.
A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.