Related papers: Implementation of a framework for deploying AI inference engines in FPGAs

Implementation of a framework for deploying AI inference engines in FPGAs

URL: http://arxiv.org/abs/2305.19455v1
Date: Tue, 30 May 2023 23:37:51 GMT
Title: Implementation of a framework for deploying AI inference engines in FPGAs
Authors: Ryan Herbst, Ryan Coffee, Nathan Fronk, Kukhee Kim, Kuktae Kim, Larry Ruckman, and J.J. Russell
Abstract summary: The goal is to ensure the highest possible framerate while keeping the maximum latency constrained to the needs of the experiment. The ability to reduce the precision of the implemented networks through quantization is necessary to optimize the use of both DSP and memory resources in the FPGA.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline experiments at up to 1Mhz These experimentals will require new ultrahigh rate UHR detectors that can operate at rates above 100 kHz and generate data throughputs upwards of 1 TBs a data velocity which requires prohibitively large investments in storage infrastructure Machine Learning has demonstrated the potential to digest large datasets to extract relevant insights however current implementations show latencies that are too high for realtime data reduction objectives SLAC has endeavored on the creation of a software framework which translates MLs structures for deployment on Field Programmable Gate Arrays FPGAs deployed at the Edge of the data chain close to the instrumentation This framework leverages Xilinxs HLS framework presenting an API modeled after the open source Keras interface to the TensorFlow library This SLAC Neural Network Library SNL framework is designed with a streaming data approach optimizing the data flow between layers while minimizing the buffer data buffering requirements The goal is to ensure the highest possible framerate while keeping the maximum latency constrained to the needs of the experiment Our framework is designed to ensure the RTL implementation of the network layers supporting full redeployment of weights and biases without requiring resynthesis after training The ability to reduce the precision of the implemented networks through quantization is necessary to optimize the use of both DSP and memory resources in the FPGA We currently have a preliminary version of the toolset and are experimenting with both general purpose example networks and networks being designed for specific LCLS2 experiments.

Related papers

Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data Acquisition [0.0]
Machine learning can be implemented in detector electronics for intelligent data processing and acquisition. implementation of ML in real-time at colliders requires very low latencies that are unachievable with a software-based approach. An analysis of neural network inference efficiency is presented, focusing on the application of collider trigger algorithms in field programmable gate arrays.
arXiv Detail & Related papers (2024-11-18T15:59:30Z)
WDMoE: Wireless Distributed Mixture of Experts for Large Language Models [68.45482959423323]
Large Language Models (LLMs) have achieved significant success in various natural language processing tasks. We propose a wireless distributed Mixture of Experts (WDMoE) architecture to enable collaborative deployment of LLMs across edge servers at the base station (BS) and mobile devices in wireless networks.
arXiv Detail & Related papers (2024-11-11T02:48:00Z)
Semi-Federated Learning: Convergence Analysis and Optimization of A Hybrid Learning Framework [70.83511997272457]
We propose a semi-federated learning (SemiFL) paradigm to leverage both the base station (BS) and devices for a hybrid implementation of centralized learning (CL) and FL. We propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers.
arXiv Detail & Related papers (2023-10-04T03:32:39Z)
Closing the loop: Autonomous experiments enabled by machine-learning-based online data analysis in synchrotron beamline environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets. In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR) We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science [0.6571063542099524]
We present an open source, lightweight, compiler framework for translating high-level representations of deep neural networks to low-level representations. We show OpenHLS is able to produce an implementation of the network with a throughput 4.8 $mu$s/sample, which is approximately a 4$times$ improvement over the existing implementation.
arXiv Detail & Related papers (2023-02-13T23:25:55Z)
Hardware-Efficient Deconvolution-Based GAN for Edge Computing [1.5229257192293197]
Generative Adversarial Networks (GAN) are cutting-edge algorithms for generating new data samples based on the learned data distribution. We proposed an HW/SW co-design approach for training quantized deconvolution GAN (QDCGAN) implemented on FPGA using a scalable streaming dataflow architecture. Various precisions, datasets, and network scalability were analyzed for low-power inference on resource-constrained platforms.
arXiv Detail & Related papers (2022-01-18T11:16:59Z)
Accelerating Recurrent Neural Networks for Gravitational Wave Experiments [1.9263019320519579]
We have developed a new architecture capable of accelerating RNN inference for analyzing time-series data from LIGO detectors. A customizable template for this architecture has been designed, which enables the generation of low-latency FPGA designs.
arXiv Detail & Related papers (2021-06-26T20:44:02Z)
JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data. We show that it achieves no-regret under conditions analogous to GP-UCB. Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
FENXI: Deep-learning Traffic Analytics at the Edge [69.34903175081284]
We present FENXI, a system to run complex analytics by leveraging TPU. FENXI decouples operations and traffic analytics which operates at different granularities. Our analysis shows that FENXI can sustain forwarding line rate traffic processing requiring only limited resources.
arXiv Detail & Related papers (2021-05-25T08:02:44Z)
Device Sampling for Heterogeneous Federated Learning: Theory, Algorithms, and Implementation [24.084053136210027]
We develop a sampling methodology based on graph sequential convolutional networks (GCNs) We find that our methodology while sampling less than 5% of all devices outperforms conventional federated learning (FedL) substantially both in terms of trained model accuracy and required resource utilization.
arXiv Detail & Related papers (2021-01-04T05:59:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.