Exploration of Hardware Acceleration Methods for an XNOR Traffic Signs
Classifier
- URL: http://arxiv.org/abs/2104.02303v1
- Date: Tue, 6 Apr 2021 06:01:57 GMT
- Title: Exploration of Hardware Acceleration Methods for an XNOR Traffic Signs
Classifier
- Authors: Dominika Przewlocka-Rus, Marcin Kowalczyk, Tomasz Kryjak
- Abstract summary: In this work, we explore the possibility of accelerating XNOR networks for traffic sign classification.
We propose a custom HDL accelerator for XNOR networks, which enables the inference with almost 450 fps.
Even better results are obtained with the second method - the Xilinx FINN accelerator - enabling to process input images with around 550 frame rate.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning algorithms are a key component of many state-of-the-art vision
systems, especially as Convolutional Neural Networks (CNN) outperform most
solutions in the sense of accuracy. To apply such algorithms in real-time
applications, one has to address the challenges of memory and computational
complexity. To deal with the first issue, we use networks with reduced
precision, specifically a binary neural network (also known as XNOR). To
satisfy the computational requirements, we propose to use highly parallel and
low-power FPGA devices. In this work, we explore the possibility of
accelerating XNOR networks for traffic sign classification. The trained binary
networks are implemented on the ZCU 104 development board, equipped with a Zynq
UltraScale+ MPSoC device using two different approaches. Firstly, we propose a
custom HDL accelerator for XNOR networks, which enables the inference with
almost 450 fps. Even better results are obtained with the second method - the
Xilinx FINN accelerator - enabling to process input images with around 550
frame rate. Both approaches provide over 96% accuracy on the test set.
Related papers
- Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Binary Complex Neural Network Acceleration on FPGA [19.38270650475235]
Binarized Complex Neural Network (BCNN) shows great potential in classifying complex data in real-time.
We propose a structural pruning based accelerator of BCNN, which is able to provide more than 5000 frames/s inference throughput on edge devices.
arXiv Detail & Related papers (2021-08-10T17:53:30Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z) - An FPGA-Based On-Device Reinforcement Learning Approach using Online
Sequential Learning [2.99321624683618]
We propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices.
It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method.
The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform.
arXiv Detail & Related papers (2020-05-10T12:37:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.