Related papers: BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep Learning

BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep Learning

URL: http://arxiv.org/abs/2102.10140v1
Date: Fri, 19 Feb 2021 20:00:21 GMT
Title: BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep Learning
Authors: D. Dang, S. V. R. Chittamuru, S. Pasricha, R. Mahapatra, D. Sahoo
Abstract summary: Training deep learning networks involves continuous weight updates across the various layers of the deep network while using a backpropagation algorithm (BP) BPLight-CNN is a first-of-its-kind photonic and memristor-based CNN architecture for end-to-end training and prediction.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Training deep learning networks involves continuous weight updates across the various layers of the deep network while using a backpropagation algorithm (BP). This results in expensive computation overheads during training. Consequently, most deep learning accelerators today employ pre-trained weights and focus only on improving the design of the inference phase. The recent trend is to build a complete deep learning accelerator by incorporating the training module. Such efforts require an ultra-fast chip architecture for executing the BP algorithm. In this article, we propose a novel photonics-based backpropagation accelerator for high performance deep learning training. We present the design for a convolutional neural network, BPLight-CNN, which incorporates the silicon photonics-based backpropagation accelerator. BPLight-CNN is a first-of-its-kind photonic and memristor-based CNN architecture for end-to-end training and prediction. We evaluate BPLight-CNN using a photonic CAD framework (IPKISS) on deep learning benchmark models including LeNet and VGG-Net. The proposed design achieves (i) at least 34x speedup, 34x improvement in computational efficiency, and 38.5x energy savings, during training; and (ii) 29x speedup, 31x improvement in computational efficiency, and 38.7x improvement in energy savings, during inference compared to the state-of-the-art designs. All these comparisons are done at a 16-bit resolution; and BPLight-CNN achieves these improvements at a cost of approximately 6% lower accuracy compared to the state-of-the-art.

Related papers

A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks [0.0]
Brain-inspired algorithms are attractive and emerging alternatives to classical deep learning methods. BCPNN is an important tool for both machine learning and computational neuroscience research. BCPNN can reach state-of-the-art performance in tasks such as learning and memory recall compared to other models. We design a custom stream-based accelerator for BCPNN using Field-Programmable Gate Arrays (FPGA) using Xilinx Vitis High-Level Synthesis (HLS) flow.
arXiv Detail & Related papers (2025-03-03T14:06:43Z)
Scalable Back-Propagation-Free Training of Optical Physics-Informed Neural Networks [12.726911225088443]
Physics-informed neural networks (PINNs) have shown promise in solving partial differential equations (PDEs) Photonic computing offers a potential solution to achieve this goal because of its ultra-high operation speed. This paper proposes a completely back-propagation-free (BP-free) and highly salable framework for training real-size PINNs on silicon photonic platforms.
arXiv Detail & Related papers (2025-02-17T23:45:23Z)
Streamlined optical training of large-scale modern deep learning architectures with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, at speeds up to 1500 TeraOPS under 30 Watts of power. We study the scaling of the training time, and demonstrate a potential advantage of our hybrid opto-electronic approach for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z)
Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint. This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z)
Sophisticated deep learning with on-chip optical diffractive tensor processing [5.081061839052458]
Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and power-wall brought by electronic counterparts. We propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration, termed optical convolution unit (OCU) With OCU as the fundamental unit, we build an optical convolutional neural network (oCNN) to implement two popular deep learning tasks: classification and regression.
arXiv Detail & Related papers (2022-12-20T03:33:26Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
H2Learn: High-Efficiency Learning Accelerator for High-Accuracy Spiking Neural Networks [25.768116231283045]
We propose H2Learn, a novel architecture that can achieve high efficiency for BPTT-based SNN learning. Compared with the modern NVIDIA V100 GPU, H2Learn achieves 7.38x area saving, 5.74-10.20x speedup, and 5.25-7.12x energy saving on several benchmark datasets.
arXiv Detail & Related papers (2021-07-25T07:37:17Z)
Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression [12.37129078618206]
Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks. Most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency. This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activity.
arXiv Detail & Related papers (2021-07-16T18:23:36Z)
3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration [8.419854797930668]
Deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. This paper emphasizes the importance of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
arXiv Detail & Related papers (2021-05-11T03:22:30Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients. FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.