BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep
Learning
- URL: http://arxiv.org/abs/2102.10140v1
- Date: Fri, 19 Feb 2021 20:00:21 GMT
- Title: BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep
Learning
- Authors: D. Dang, S. V. R. Chittamuru, S. Pasricha, R. Mahapatra, D. Sahoo
- Abstract summary: Training deep learning networks involves continuous weight updates across the various layers of the deep network while using a backpropagation algorithm (BP)
BPLight-CNN is a first-of-its-kind photonic and memristor-based CNN architecture for end-to-end training and prediction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Training deep learning networks involves continuous weight updates across the
various layers of the deep network while using a backpropagation algorithm
(BP). This results in expensive computation overheads during training.
Consequently, most deep learning accelerators today employ pre-trained weights
and focus only on improving the design of the inference phase. The recent trend
is to build a complete deep learning accelerator by incorporating the training
module. Such efforts require an ultra-fast chip architecture for executing the
BP algorithm. In this article, we propose a novel photonics-based
backpropagation accelerator for high performance deep learning training. We
present the design for a convolutional neural network, BPLight-CNN, which
incorporates the silicon photonics-based backpropagation accelerator.
BPLight-CNN is a first-of-its-kind photonic and memristor-based CNN
architecture for end-to-end training and prediction. We evaluate BPLight-CNN
using a photonic CAD framework (IPKISS) on deep learning benchmark models
including LeNet and VGG-Net. The proposed design achieves (i) at least 34x
speedup, 34x improvement in computational efficiency, and 38.5x energy savings,
during training; and (ii) 29x speedup, 31x improvement in computational
efficiency, and 38.7x improvement in energy savings, during inference compared
to the state-of-the-art designs. All these comparisons are done at a 16-bit
resolution; and BPLight-CNN achieves these improvements at a cost of
approximately 6% lower accuracy compared to the state-of-the-art.
Related papers
- Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural
Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint.
This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z) - Sophisticated deep learning with on-chip optical diffractive tensor
processing [5.081061839052458]
Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and power-wall brought by electronic counterparts.
We propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration, termed optical convolution unit (OCU)
With OCU as the fundamental unit, we build an optical convolutional neural network (oCNN) to implement two popular deep learning tasks: classification and regression.
arXiv Detail & Related papers (2022-12-20T03:33:26Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - H2Learn: High-Efficiency Learning Accelerator for High-Accuracy Spiking
Neural Networks [25.768116231283045]
We propose H2Learn, a novel architecture that can achieve high efficiency for BPTT-based SNN learning.
Compared with the modern NVIDIA V100 GPU, H2Learn achieves 7.38x area saving, 5.74-10.20x speedup, and 5.25-7.12x energy saving on several benchmark datasets.
arXiv Detail & Related papers (2021-07-25T07:37:17Z) - Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided
Compression [12.37129078618206]
Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks.
Most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency.
This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activity.
arXiv Detail & Related papers (2021-07-16T18:23:36Z) - 3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization,
and Ultra-Low Latency Acceleration [8.419854797930668]
Deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services.
This paper emphasizes the importance of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
arXiv Detail & Related papers (2021-05-11T03:22:30Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.