Learning on Hardware: A Tutorial on Neural Network Accelerators and
Co-Processors
- URL: http://arxiv.org/abs/2104.09252v1
- Date: Mon, 19 Apr 2021 12:50:27 GMT
- Title: Learning on Hardware: A Tutorial on Neural Network Accelerators and
Co-Processors
- Authors: Lukas Baischer, Matthias Wess, Nima TaheriNejad
- Abstract summary: Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks.
In computer vision and speech recognition, they have a better accuracy than common algorithms, and in some tasks, they boast an even higher accuracy than human experts.
With the progress of DNNs in recent years, many other fields of application such as diagnosis of diseases and autonomous driving are taking advantage of them.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have the advantage that they can take into
account a large number of parameters, which enables them to solve complex
tasks. In computer vision and speech recognition, they have a better accuracy
than common algorithms, and in some tasks, they boast an even higher accuracy
than human experts. With the progress of DNNs in recent years, many other
fields of application such as diagnosis of diseases and autonomous driving are
taking advantage of them. The trend at DNNs is clear: The network size is
growing exponentially, which leads to an exponential increase in computational
effort and required memory size. For this reason, optimized hardware
accelerators are used to increase the performance of the inference of neuronal
networks. However, there are various neural network hardware accelerator
platforms, such as graphics processing units (GPUs), application specific
integrated circuits (ASICs) and field programmable gate arrays (FPGAs). Each of
these platforms offer certain advantages and disadvantages. Also, there are
various methods for reducing the computational effort of DNNs, which are
differently suitable for each hardware accelerator. In this article an overview
of existing neural network hardware accelerators and acceleration methods is
given. Their strengths and weaknesses are shown and a recommendation of
suitable applications is given. In particular, we focus on acceleration of the
inference of convolutional neural networks (CNNs) used for image recognition
tasks. Given that there exist many different hardware architectures. FPGA-based
implementations are well-suited to show the effect of DNN optimization methods
on accuracy and throughput. For this reason, the focus of this work is more on
FPGA-based implementations.
Related papers
- Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks
with Emerging Neural Encoding on FPGAs [6.047137174639418]
End-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs.
E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude.
arXiv Detail & Related papers (2021-11-19T04:01:19Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Learning from Event Cameras with Sparse Spiking Convolutional Neural
Networks [0.0]
Convolutional neural networks (CNNs) are now the de facto solution for computer vision problems.
We propose an end-to-end biologically inspired approach using event cameras and spiking neural networks (SNNs)
Our method enables the training of sparse spiking neural networks directly on event data, using the popular deep learning framework PyTorch.
arXiv Detail & Related papers (2021-04-26T13:52:01Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Overview of FPGA deep learning acceleration based on convolutional
neural network [0.76146285961466]
In recent years, deep learning has become more and more mature, and as a commonly used algorithm in deep learning, convolutional neural networks have been widely used in various visual tasks.
This article is a review article, which mainly introduces the related theories and algorithms of convolution.
It summarizes the application scenarios of several existing FPGA technologies based on convolutional neural networks, and mainly introduces the application of accelerators.
arXiv Detail & Related papers (2020-12-23T12:44:24Z) - DANCE: Differentiable Accelerator/Network Co-Exploration [8.540518473228078]
This work presents a differentiable approach towards the co-exploration of the hardware accelerator and network architecture design.
By modeling the hardware evaluation software with a neural network, the relation between the accelerator architecture and the hardware metrics becomes differentiable.
Compared to the naive existing approaches, our method performs co-exploration in a significantly shorter time, while achieving superior accuracy and hardware cost metrics.
arXiv Detail & Related papers (2020-09-14T07:43:27Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.