A Heterogeneous Parallel Non-von Neumann Architecture System for
Accurate and Efficient Machine Learning Molecular Dynamics
- URL: http://arxiv.org/abs/2303.15474v1
- Date: Sun, 26 Mar 2023 05:43:49 GMT
- Title: A Heterogeneous Parallel Non-von Neumann Architecture System for
Accurate and Efficient Machine Learning Molecular Dynamics
- Authors: Zhuoying Zhao, Ziling Tan, Pinghui Mo, Xiaonan Wang, Dan Zhao, Xin
Zhang, Ming Tao, and Jie Liu
- Abstract summary: This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) calculations.
The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization.
- Score: 9.329011150399726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a special-purpose system to achieve high-accuracy and
high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The
system consists of field programmable gate array (FPGA) and application
specific integrated circuit (ASIC) working in heterogeneous parallelization. To
be specific, a multiplication-less neural network (NN) is deployed on the
non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic
forces, which is the most computationally expensive part of MD. All other
calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to
achieve similar-level accuracy, the proposed NvN-based system based on low-end
fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy
efficiency than state-of-the-art vN based MLMD using graphics processing units
(GPUs) based on much more advanced technologies (12 nm), indicating superiority
of the proposed NvN-based heterogeneous parallel architecture.
Related papers
- MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow [5.310696264367485]
MOFA is an open-source generative AI (GenAI) plus simulation workflow for high- throughput generation of metal-organic frameworks (MOFs)
MOFA addresses key challenges in integrating GPU-accelerated computing for GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs.
arXiv Detail & Related papers (2025-01-18T04:10:44Z) - Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML.
This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model.
A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z) - Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA [10.630802853096462]
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations.
This paper proposes a high- throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs.
Using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.
arXiv Detail & Related papers (2024-07-02T15:28:10Z) - Many-body computing on Field Programmable Gate Arrays [5.3808713424582395]
We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations.
This has resulted in a tenfold speedup compared to CPU-based computation for a Monte Carlo algorithm.
For the first time, the utilization of FPGA to accelerate a typical tensor network algorithm for many-body ground state calculations.
arXiv Detail & Related papers (2024-02-09T14:01:02Z) - End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
and ASICs [49.358119307844035]
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs)
This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow.
We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the Large Hadron Collider (LHC)
We implement an optimized mixed-precision NN for high-momentum particle jets in simulated LHC proton-proton collisions.
arXiv Detail & Related papers (2023-04-13T18:00:01Z) - Decomposition of Matrix Product States into Shallow Quantum Circuits [62.5210028594015]
tensor network (TN) algorithms can be mapped to parametrized quantum circuits (PQCs)
We propose a new protocol for approximating TN states using realistic quantum circuits.
Our results reveal one particular protocol, involving sequential growth and optimization of the quantum circuit, to outperform all other methods.
arXiv Detail & Related papers (2022-09-01T17:08:41Z) - GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access [71.58925117604039]
Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks.
We propose a neural network architecture that combines the advantages of both linear and non-linear processing.
arXiv Detail & Related papers (2022-06-13T09:38:23Z) - Joint Deep Reinforcement Learning and Unfolding: Beam Selection and
Precoding for mmWave Multiuser MIMO with Lens Arrays [54.43962058166702]
millimeter wave (mmWave) multiuser multiple-input multiple-output (MU-MIMO) systems with discrete lens arrays have received great attention.
In this work, we investigate the joint design of a beam precoding matrix for mmWave MU-MIMO systems with DLA.
arXiv Detail & Related papers (2021-01-05T03:55:04Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for
Deep Learning [1.2019888796331233]
Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of deep neural networks (DNNs)
We introduce efficient techniques to SC for weight update in DNNs with the activation functions required by many state-of-the-art networks.
Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling.
Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier, ESSOP is 82.2% and 93.7% better in energy
arXiv Detail & Related papers (2020-03-25T07:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.