Fully-parallel Convolutional Neural Network Hardware
- URL: http://arxiv.org/abs/2006.12439v1
- Date: Mon, 22 Jun 2020 17:19:09 GMT
- Title: Fully-parallel Convolutional Neural Network Hardware
- Authors: Christiam F. Frasser, Pablo Linares-Serrano, V. Canals, Miquel Roca,
T. Serrano-Gotarredona, Josep L. Rossello
- Abstract summary: We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
- Score: 0.7829352305480285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A new trans-disciplinary knowledge area, Edge Artificial Intelligence or Edge
Intelligence, is beginning to receive a tremendous amount of interest from the
machine learning community due to the ever increasing popularization of the
Internet of Things (IoT). Unfortunately, the incorporation of AI
characteristics to edge computing devices presents the drawbacks of being power
and area hungry for typical machine learning techniques such as Convolutional
Neural Networks (CNN). In this work, we propose a new power-and-area-efficient
architecture for implementing Articial Neural Networks (ANNs) in hardware,
based on the exploitation of correlation phenomenon in Stochastic Computing
(SC) systems. The architecture purposed can solve the difficult implementation
challenges that SC presents for CNN applications, such as the high resources
used in binary-tostochastic conversion, the inaccuracy produced by undesired
correlation between signals, and the stochastic maximum function
implementation. Compared with traditional binary logic implementations,
experimental results showed an improvement of 19.6x and 6.3x in terms of speed
performance and energy efficiency, for the FPGA implementation. We have also
realized a full VLSI implementation of the proposed SC-CNN architecture
demonstrating that our optimization achieve a 18x area reduction over previous
SC-DNN architecture VLSI implementation in a comparable technological node. For
the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a
single FPGA, showing the benefits of using stochastic computing for embedded
applications, in contrast to traditional binary logic implementations.
Related papers
- Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML.
This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model.
A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Exploiting On-chip Heterogeneity of Versal Architecture for GNN
Inference Acceleration [0.5249805590164902]
Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML) applications, such as social network analysis, bioinformatics, etc.
We leverage the heterogeneous computing capabilities of AMD Versal ACAP architecture to accelerate GNN inference.
For Graph Convolutional Network (GCN) inference, our approach leads to a speedup of 3.9-96.7x compared to designs using PL only on the same ACAP device.
arXiv Detail & Related papers (2023-08-04T23:57:55Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for
Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames.
Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks.
We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z) - E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks
with Emerging Neural Encoding on FPGAs [6.047137174639418]
End-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs.
E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude.
arXiv Detail & Related papers (2021-11-19T04:01:19Z) - Scaled-Time-Attention Robust Edge Network [2.4417312983418014]
This paper describes a systematic approach towards building a new family of neural networks based on a delay-loop version of a reservoir neural network.
The resulting architecture, called Scaled-Time-Attention Robust Edge (STARE) network, exploits hyper dimensional space and non-multiply-and-add computation.
We demonstrate that STARE is applicable to a variety of applications with improved performance and lower implementation complexity.
arXiv Detail & Related papers (2021-07-09T21:24:49Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Design Challenges of Neural Network Acceleration Using Stochastic
Computing [0.0]
This report evaluates and compares two proposed-based NN designs for the Internet of Things (IoTs)
We find that BISC outperforms the other architectures when executing the MNIST-5 NN model.
Our analysis and simulation experiments indicate that this architecture is around 50X faster, 5.7X less area and consumes 7.8X and 1.8X less power than the two ESL designs.
arXiv Detail & Related papers (2020-06-08T16:06:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.