ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network
- URL: http://arxiv.org/abs/2005.06892v1
- Date: Thu, 14 May 2020 11:54:04 GMT
- Title: ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network
- Authors: David Gschwend
- Abstract summary: This thesis explores the potential of FPGA-based CNN acceleration.
It demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image Understanding is becoming a vital feature in ever more applications
ranging from medical diagnostics to autonomous vehicles. Many applications
demand for embedded solutions that integrate into existing systems with tight
real-time and power constraints. Convolutional Neural Networks (CNNs) presently
achieve record-breaking accuracies in all image understanding benchmarks, but
have a very high computational complexity. Embedded CNNs thus call for small
and efficient, yet very powerful computing platforms. This master thesis
explores the potential of FPGA-based CNN acceleration and demonstrates a fully
functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The
ZynqNet Embedded CNN is designed for image classification on ImageNet and
consists of ZynqNet CNN, an optimized and customized CNN topology, and the
ZynqNet FPGA Accelerator, an FPGA-based architecture for its evaluation.
ZynqNet CNN is a highly efficient CNN topology. Detailed analysis and
optimization of prior topologies using the custom-designed Netscope CNN
Analyzer have enabled a CNN with 84.5% top-5 accuracy at a computational
complexity of only 530 million multiplyaccumulate operations. The topology is
highly regular and consists exclusively of convolutional layers, ReLU
nonlinearities and one global pooling layer. The CNN fits ideally onto the FPGA
accelerator. The ZynqNet FPGA Accelerator allows an efficient evaluation of
ZynqNet CNN. It accelerates the full network based on a nested-loop algorithm
which minimizes the number of arithmetic operations and memory accesses. The
FPGA accelerator has been synthesized using High-Level Synthesis for the Xilinx
Zynq XC-7Z045, and reaches a clock frequency of 200MHz with a device
utilization of 80% to 90 %.
Related papers
- Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - Optimization of FPGA-based CNN Accelerators Using Metaheuristics [1.854931308524932]
convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields.
FPGAs have seen a surge in interest for accelerating CNN inference.
Current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs)
arXiv Detail & Related papers (2022-09-22T18:57:49Z) - Towards Enabling Dynamic Convolution Neural Network Inference for Edge
Intelligence [0.0]
Recent advances in edge intelligence require CNN inference on edge network to increase throughput and reduce latency.
To provide flexibility, dynamic parameter allocation to different mobile devices is required to implement either a predefined or defined on-the-fly CNN architecture.
We propose a library-based approach to design scalable and dynamic distributed CNN inference on the fly.
arXiv Detail & Related papers (2022-02-18T22:33:42Z) - Continual 3D Convolutional Neural Networks for Real-time Processing of
Videos [93.73198973454944]
We introduce Continual 3D Contemporalal Neural Networks (Co3D CNNs)
Co3D CNNs process videos frame-by-frame rather than by clip by clip.
We show that Co3D CNNs initialised on the weights from preexisting state-of-the-art video recognition models reduce floating point operations for frame-wise computations by 10.0-12.4x while improving accuracy on Kinetics-400 by 2.3-3.8x.
arXiv Detail & Related papers (2021-05-31T18:30:52Z) - Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA
Accelerator Architecture for Accelerating Convolutional Neural Network
Inference in Cloud/Edge Computing [8.826181951806928]
Systolic-CNN is an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture.
Systolic-CNN is optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing.
arXiv Detail & Related papers (2020-12-06T03:53:11Z) - 3D CNNs with Adaptive Temporal Feature Resolutions [83.43776851586351]
Similarity Guided Sampling (SGS) module can be plugged into any existing 3D CNN architecture.
SGS empowers 3D CNNs by learning the similarity of temporal features and grouping similar features together.
Our evaluations show that the proposed module improves the state-of-the-art by reducing the computational cost (GFLOPs) by half while preserving or even improving the accuracy.
arXiv Detail & Related papers (2020-11-17T14:34:05Z) - CNN2Gate: Toward Designing a General Framework for Implementation of
Convolutional Neural Networks on FPGA [0.3655021726150368]
This paper introduces an integrated framework that supports compilation of a CNN model for an FPGA target.
CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors.
This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms.
arXiv Detail & Related papers (2020-04-06T01:57:53Z) - Evolutionary Bin Packing for Memory-Efficient Dataflow Inference
Acceleration on FPGA [2.3395728784538767]
Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency.
However, the shapes complex of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM)
We present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM.
arXiv Detail & Related papers (2020-03-24T09:55:08Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.