Transparent FPGA Acceleration with TensorFlow
- URL: http://arxiv.org/abs/2102.06018v1
- Date: Tue, 2 Feb 2021 06:49:33 GMT
- Title: Transparent FPGA Acceleration with TensorFlow
- Authors: Simon Pfenning, Philipp Holzinger, Marc Reichenbach
- Abstract summary: We propose a toolflow for developers who want to make use of a new deep learning accelerator.
On the backend we use an FPGA, which is addressable via a runtime environment.
This can be achieved by our HSA toolflow, since the hardware is not statically configured with the structure of the network.
- Score: 1.0828616610785522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today, artificial neural networks are one of the major innovators pushing the
progress of machine learning. This has particularly affected the development of
neural network accelerating hardware. However, since most of these
architectures require specialized toolchains, there is a certain amount of
additional effort for developers each time they want to make use of a new deep
learning accelerator. Furthermore the flexibility of the device is bound to the
architecture itself, as well as to the functionality of the runtime
environment.
In this paper we propose a toolflow using TensorFlow as frontend, thus
offering developers the opportunity of using a familiar environment. On the
backend we use an FPGA, which is addressable via an HSA runtime environment. In
this way we are able to hide the complexity of controlling new hardware from
the user, while at the same time maintaining a high amount of flexibility. This
can be achieved by our HSA toolflow, since the hardware is not statically
configured with the structure of the network. Instead, it can be dynamically
reconfigured during runtime with the respective kernels executed by the network
and simultaneously from other sources e.g. OpenCL/OpenMP.
Related papers
- Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI [0.0]
Field-Programmable Gate Arrays (FPGAs) are a reconfigurable platform that allows mapping AI algorithms directly into device logic.<n>Unlike CPU and GPU architecture, an FPGA can be reconfigured in the field to adapt its physical structure to a specific model.<n> Partial reconfiguration and compilation flows from AI frameworks are shortening the path from prototype to deployment.
arXiv Detail & Related papers (2025-11-04T03:41:42Z) - HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices.
The inference speed of the design is evaluated over different resource constrained FPGA devices.
We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z) - SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA [42.170149806080204]
SpikExplorer is a Python tool for hardware-oriented Automatic Design Space Exploration.
It searches the optimal network architecture, neuron model, and internal and training parameters.
It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image.
arXiv Detail & Related papers (2024-04-04T17:53:08Z) - Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural
Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint.
This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z) - FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge.
In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - SOL: Reducing the Maintenance Overhead for Integrating Hardware Support
into AI Frameworks [0.7614628596146599]
AI frameworks such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J provide a high level scripting API.
Less mainstream CPU, GPU or accelerator vendors need to put in a high effort to get their hardware supported by these frameworks.
NEC Laboratories Europe started developing the SOL AI Optimization project already years ago.
arXiv Detail & Related papers (2022-05-19T08:40:46Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Exposing Hardware Building Blocks to Machine Learning Frameworks [4.56877715768796]
We focus on how to design topologies that complement such a view of neurons as unique functions.
We develop a library that supports training a neural network with custom sparsity and quantization.
arXiv Detail & Related papers (2020-04-10T14:26:00Z) - Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF)
It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization.
The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.