論文の概要: Implementation of a framework for deploying AI inference engines in
- arxiv url: http://arxiv.org/abs/2305.19455v1
- Date: Tue, 30 May 2023 23:37:51 GMT
- ステータス: 処理完了
- システム内更新日: 2023-06-01 19:18:27.307143
- Title: Implementation of a framework for deploying AI inference engines in
- Title(参考訳): FPGAにおけるAI推論エンジンのデプロイのためのフレームワークの実装
- Authors: Ryan Herbst, Ryan Coffee, Nathan Fronk, Kukhee Kim, Kuktae Kim, Larry
Ruckman, and J.J. Russell
- Abstract要約: 目標は、最大限のフレームレートを確保しながら、最大レイテンシを実験のニーズに制限することである。
- 参考スコア(独自算出の注目度): 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline
experiments at up to 1Mhz These experimentals will require new ultrahigh rate
UHR detectors that can operate at rates above 100 kHz and generate data
throughputs upwards of 1 TBs a data velocity which requires prohibitively large
investments in storage infrastructure Machine Learning has demonstrated the
potential to digest large datasets to extract relevant insights however current
implementations show latencies that are too high for realtime data reduction
objectives SLAC has endeavored on the creation of a software framework which
translates MLs structures for deployment on Field Programmable Gate Arrays
FPGAs deployed at the Edge of the data chain close to the instrumentation This
framework leverages Xilinxs HLS framework presenting an API modeled after the
open source Keras interface to the TensorFlow library This SLAC Neural Network
Library SNL framework is designed with a streaming data approach optimizing the
data flow between layers while minimizing the buffer data buffering
requirements The goal is to ensure the highest possible framerate while keeping
the maximum latency constrained to the needs of the experiment Our framework is
designed to ensure the RTL implementation of the network layers supporting full
redeployment of weights and biases without requiring resynthesis after training
The ability to reduce the precision of the implemented networks through
quantization is necessary to optimize the use of both DSP and memory resources
in the FPGA We currently have a preliminary version of the toolset and are
experimenting with both general purpose example networks and networks being
designed for specific LCLS2 experiments.
- Abstract(参考訳): The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline experiments at up to 1Mhz These experimentals will require new ultrahigh rate UHR detectors that can operate at rates above 100 kHz and generate data throughputs upwards of 1 TBs a data velocity which requires prohibitively large investments in storage infrastructure Machine Learning has demonstrated the potential to digest large datasets to extract relevant insights however current implementations show latencies that are too high for realtime data reduction objectives SLAC has endeavored on the creation of a software framework which translates MLs structures for deployment on Field Programmable Gate Arrays FPGAs deployed at the Edge of the data chain close to the instrumentation This framework leverages Xilinxs HLS framework presenting an API modeled after the open source Keras interface to the TensorFlow library This SLAC Neural Network Library SNL framework is designed with a streaming data approach optimizing the data flow between layers while minimizing the buffer data buffering requirements The goal is to ensure the highest possible framerate while keeping the maximum latency constrained to the needs of the experiment Our framework is designed to ensure the RTL implementation of the network layers supporting full redeployment of weights and biases without requiring resynthesis after training The ability to reduce the precision of the implemented networks through quantization is necessary to optimize the use of both DSP and memory resources in the FPGA We currently have a preliminary version of the toolset and are experimenting with both general purpose example networks and networks being designed for specific LCLS2 experiments.
- Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data Acquisition [0.0]
論文 参考訳(メタデータ) (2024-11-18T15:59:30Z) - WDMoE: Wireless Distributed Mixture of Experts for Large Language Models [68.45482959423323]
本稿では,無線ネットワーク上での基地局(BS)およびモバイルデバイスにおけるエッジサーバ間のLLMの協調展開を実現するために,無線分散Mixture of Experts(WDMoE)アーキテクチャを提案する。
論文 参考訳(メタデータ) (2024-11-11T02:48:00Z) - Semi-Federated Learning: Convergence Analysis and Optimization of A
Hybrid Learning Framework [70.83511997272457]
論文 参考訳(メタデータ) (2023-10-04T03:32:39Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
論文 参考訳(メタデータ) (2023-06-20T21:21:19Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
論文 参考訳(メタデータ) (2023-05-24T16:08:55Z) - OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for
Experimental Science [0.6571063542099524]
論文 参考訳(メタデータ) (2023-02-13T23:25:55Z) - Hardware-Efficient Deconvolution-Based GAN for Edge Computing [1.5229257192293197]
Generative Adversarial Networks (GAN) は、学習したデータ分布に基づいて新しいデータサンプルを生成する最先端のアルゴリズムである。
論文 参考訳(メタデータ) (2022-01-18T11:16:59Z) - Accelerating Recurrent Neural Networks for Gravitational Wave
Experiments [1.9263019320519579]
論文 参考訳(メタデータ) (2021-06-26T20:44:02Z) - JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
GP-UCBに類似した条件下では, 応答が得られないことを示す。
論文 参考訳(メタデータ) (2021-06-02T05:03:38Z) - FENXI: Deep-learning Traffic Analytics at the Edge [69.34903175081284]
論文 参考訳(メタデータ) (2021-05-25T08:02:44Z) - Device Sampling for Heterogeneous Federated Learning: Theory,
Algorithms, and Implementation [24.084053136210027]
提案手法は,全機器の5%以下をサンプリングしながら,訓練されたモデル精度と必要なリソース利用の両面で,fedl(federated learning)を実質的に上回っている。
論文 参考訳(メタデータ) (2021-01-04T05:59:50Z)