Many-body computing on Field Programmable Gate Arrays
- URL: http://arxiv.org/abs/2402.06415v1
- Date: Fri, 9 Feb 2024 14:01:02 GMT
- Title: Many-body computing on Field Programmable Gate Arrays
- Authors: Songtai Lv, Yang Liang, Yuchen Meng, Xiaochen Yao, Jincheng Xu, Yang
Liu, Qibin Zheng, Haiyuan Zou
- Abstract summary: We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations.
This has resulted in a remarkable tenfold speedup compared to CPU-based computation.
- Score: 5.612626580467746
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: A new implementation of many-body calculations is of paramount importance in
the field of computational physics. In this study, we leverage the capabilities
of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body
calculations. Through the design of appropriate schemes for Monte Carlo and
tensor network methods, we effectively utilize the parallel processing
capabilities provided by FPGAs. This has resulted in a remarkable tenfold
speedup compared to CPU-based computation for a Monte Carlo algorithm. We also
demonstrate, for the first time, the utilization of FPGA to accelerate a
typical tensor network algorithm. Our findings unambiguously highlight the
significant advantages of hardware implementation and pave the way for novel
approaches to many-body calculations.
Related papers
- Design of an FPGA-Based Neutral Atom Rearrangement Accelerator for Quantum Computing [1.003635085077511]
Neutral atoms have emerged as a promising technology for implementing quantum computers.
We propose a novel quadrant-based rearrangement algorithm that employs a divide-and-conquer strategy and also enables the simultaneous movement of multiple atoms.
This is the first hardware acceleration work for atom rearrangement, and it significantly reduces the processing time.
arXiv Detail & Related papers (2024-11-19T10:38:21Z) - All-to-all reconfigurability with sparse and higher-order Ising machines [0.0]
We introduce a multiplexed architecture that emulates all-to-all network functionality.
We show that running the adaptive parallel tempering algorithm demonstrates competitive algorithmic and prefactor advantages.
scaled magnetic versions of p-bit IMs could lead to orders of magnitude improvements over the state of the art for generic optimization.
arXiv Detail & Related papers (2023-11-21T20:27:02Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - An FPGA-based Solution for Convolution Operation Acceleration [0.0]
This paper proposes an FPGA-based architecture to accelerate the convolution operation.
The project's purpose is to produce an FPGA IP core that can process a convolutional layer at a time.
arXiv Detail & Related papers (2022-06-09T14:12:30Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - Accelerated Charged Particle Tracking with Graph Neural Networks on
FPGAs [0.0]
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks.
We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing.
arXiv Detail & Related papers (2020-11-30T18:17:43Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11.
A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.