Related papers: Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors

Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors

URL: http://arxiv.org/abs/2207.05278v1
Date: Tue, 12 Jul 2022 03:18:00 GMT
Title: Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors
Authors: Sairam Sri Vatsavai, Ishan G Thakkar
Abstract summary: Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements. Previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. We present a novel way of introducing reconfigurability in the MRR-based CNN accelerators.
Score: 0.22843885788439797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of CNNs with mixed-sized tensors on such inflexible accelerators often leads to low hardware utilization, which diminishes the achievable performance and energy efficiency from the accelerators. In this paper, we present a novel way of introducing reconfigurability in the MRR-based CNN accelerators, to enable dynamic maximization of the size compatibility between the accelerator hardware components and the CNN tensors that are processed using the hardware components. We classify the state-of-the-art MRR-based CNN accelerators from prior works into two categories, based on the layout and relative placements of the utilized hardware components in the accelerators. We then use our method to introduce reconfigurability in accelerators from these two classes, to consequently improve their parallelism, the flexibility of efficiently mapping tensors of different sizes, speed, and overall energy efficiency. We evaluate our reconfigurable accelerators against three prior works for the area proportionate outlook (equal hardware area for all accelerators). Our evaluation for the inference of four modern CNNs indicates that our designed reconfigurable CNN accelerators provide improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W, compared to an MRR-based accelerator from prior work.

Related papers

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators [0.20971479389679332]
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors. We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements.
arXiv Detail & Related papers (2024-04-08T10:10:30Z)
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z)
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation [13.681095158525514]
unzipFPGA is a novel CNN inference system that counteracts the limitations of existing CNN engines. We introduce a weights generator module that enables the on-chip on-the-fly generation of weights. We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair.
arXiv Detail & Related papers (2023-07-25T11:19:21Z)
Transferability of Convolutional Neural Networks in Stationary Learning Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z)
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs [0.0]
A CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations. Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs. Existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size.
arXiv Detail & Related papers (2023-02-14T13:35:15Z)
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators [12.223778147172107]
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs) These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption. We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
arXiv Detail & Related papers (2022-02-04T18:48:36Z)
unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation [17.142094527372993]
Singlevolution engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs) In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-con stage to decompress the weights at run time. To minimise the negative impact of limited bandwidth on memory-bound layers, we present a novel hardware component that enables the on-the-fly generation of weights.
arXiv Detail & Related papers (2021-03-09T18:19:41Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
L2PF -- Learning to Prune Faster [57.32153461504626]
We present a multi-task, try-and-learn method, discretely learning redundant filters of the CNN and a continuous action of how long the layers have to be fine-tuned. For ResNet20, we have achieved a compression ratio of 3.84 x with minimal accuracy degradation. Compared to the state-of-the-art pruning method, we reduced the GPU hours by 1.71 x.
arXiv Detail & Related papers (2021-01-07T18:13:37Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.