Related papers: Fully-parallel Convolutional Neural Network Hardware

Fully-parallel Convolutional Neural Network Hardware

URL: http://arxiv.org/abs/2006.12439v1
Date: Mon, 22 Jun 2020 17:19:09 GMT
Title: Fully-parallel Convolutional Neural Network Hardware
Authors: Christiam F. Frasser, Pablo Linares-Serrano, V. Canals, Miquel Roca, T. Serrano-Gotarredona, Josep L. Rossello
Abstract summary: We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
Score: 0.7829352305480285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A new trans-disciplinary knowledge area, Edge Artificial Intelligence or Edge Intelligence, is beginning to receive a tremendous amount of interest from the machine learning community due to the ever increasing popularization of the Internet of Things (IoT). Unfortunately, the incorporation of AI characteristics to edge computing devices presents the drawbacks of being power and area hungry for typical machine learning techniques such as Convolutional Neural Networks (CNN). In this work, we propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware, based on the exploitation of correlation phenomenon in Stochastic Computing (SC) systems. The architecture purposed can solve the difficult implementation challenges that SC presents for CNN applications, such as the high resources used in binary-tostochastic conversion, the inaccuracy produced by undesired correlation between signals, and the stochastic maximum function implementation. Compared with traditional binary logic implementations, experimental results showed an improvement of 19.6x and 6.3x in terms of speed performance and energy efficiency, for the FPGA implementation. We have also realized a full VLSI implementation of the proposed SC-CNN architecture demonstrating that our optimization achieve a 18x area reduction over previous SC-DNN architecture VLSI implementation in a comparable technological node. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA, showing the benefits of using stochastic computing for embedded applications, in contrast to traditional binary logic implementations.

Related papers

Enabling Efficient Processing of Spiking Neural Networks with On-Chip Learning on Commodity Neuromorphic Processors for Edge AI Systems [5.343921650701002]
spiking neural network (SNN) algorithms on neuromorphic processors offer ultra-low power/energy AI computation. We propose a design methodology to enable efficient SNN processing on commodity neuromorphic processors.
arXiv Detail & Related papers (2025-04-01T16:52:03Z)
Model-free front-to-end training of a large high performance laser neural network [0.0]
We demonstrate a fully autonomous and parallel optical neural network (ONN) using off-the-shelf components. Our ONN is highly efficient and is scalable both in network size and inference bandwidth towards the GHz range. We show that our ONN can achieve high accuracy and convergence efficiency, even under limited hardware resources.
arXiv Detail & Related papers (2025-03-21T08:43:02Z)
Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML. This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model. A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Exploiting On-chip Heterogeneity of Versal Architecture for GNN Inference Acceleration [0.5249805590164902]
Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML) applications, such as social network analysis, bioinformatics, etc. We leverage the heterogeneous computing capabilities of AMD Versal ACAP architecture to accelerate GNN inference. For Graph Convolutional Network (GCN) inference, our approach leads to a speedup of 3.9-96.7x compared to designs using PL only on the same ACAP device.
arXiv Detail & Related papers (2023-08-04T23:57:55Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs [6.047137174639418]
End-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs. E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude.
arXiv Detail & Related papers (2021-11-19T04:01:19Z)
Scaled-Time-Attention Robust Edge Network [2.4417312983418014]
This paper describes a systematic approach towards building a new family of neural networks based on a delay-loop version of a reservoir neural network. The resulting architecture, called Scaled-Time-Attention Robust Edge (STARE) network, exploits hyper dimensional space and non-multiply-and-add computation. We demonstrate that STARE is applicable to a variety of applications with improved performance and lower implementation complexity.
arXiv Detail & Related papers (2021-07-09T21:24:49Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Design Challenges of Neural Network Acceleration Using Stochastic Computing [0.0]
This report evaluates and compares two proposed-based NN designs for the Internet of Things (IoTs) We find that BISC outperforms the other architectures when executing the MNIST-5 NN model. Our analysis and simulation experiments indicate that this architecture is around 50X faster, 5.7X less area and consumes 7.8X and 1.8X less power than the two ESL designs.
arXiv Detail & Related papers (2020-06-08T16:06:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.