Hardware Architecture of Embedded Inference Accelerator and Analysis of
Algorithms for Depthwise and Large-Kernel Convolutions
- URL: http://arxiv.org/abs/2104.14125v1
- Date: Thu, 29 Apr 2021 05:45:16 GMT
- Title: Hardware Architecture of Embedded Inference Accelerator and Analysis of
Algorithms for Depthwise and Large-Kernel Convolutions
- Authors: Tse-Wei Chen, Wei Tao, Deyu Wang, Dongchao Wen, Kinya Osa, Masami Kato
- Abstract summary: The proposed architecture can support filter kernels with different sizes with high flexibility.
For image classification, the accuracy is increased by 1% by simply replacing $3 times 3$ filters with $5 times 5$ filters in depthwise convolutions.
- Score: 27.141754658998323
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In order to handle modern convolutional neural networks (CNNs) efficiently, a
hardware architecture of CNN inference accelerator is proposed to handle
depthwise convolutions and regular convolutions, which are both essential
building blocks for embedded-computer-vision algorithms. Different from related
works, the proposed architecture can support filter kernels with different
sizes with high flexibility since it does not require extra costs for
intra-kernel parallelism, and it can generate convolution results faster than
the architecture of the related works. The experimental results show the
importance of supporting depthwise convolutions and dilated convolutions with
the proposed hardware architecture. In addition to depthwise convolutions with
large-kernels, a new structure called DDC layer, which includes the combination
of depthwise convolutions and dilated convolutions, is also analyzed in this
paper. For face detection, the computational costs decrease by 30%, and the
model size decreases by 20% when the DDC layers are applied to the network. For
image classification, the accuracy is increased by 1% by simply replacing $3
\times 3$ filters with $5 \times 5$ filters in depthwise convolutions.
Related papers
- ApproxDARTS: Differentiable Neural Architecture Search with Approximate Multipliers [0.24578723416255746]
We present ApproxDARTS, a neural architecture search (NAS) method enabling the popular differentiable neural architecture search method called DARTS to exploit approximate multipliers.
We show that the ApproxDARTS is able to perform a complete architecture search within less than $10$ GPU hours and produce competitive convolutional neural networks (CNN) containing approximate multipliers in convolutional layers.
arXiv Detail & Related papers (2024-04-08T09:54:57Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Does Form Follow Function? An Empirical Exploration of the Impact of
Deep Neural Network Architecture Design on Hardware-Specific Acceleration [76.35307867016336]
This study investigates the impact of deep neural network architecture design on the degree of inference speedup.
We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
arXiv Detail & Related papers (2021-07-08T23:05:39Z) - Multi-objective Evolutionary Approach for Efficient Kernel Size and
Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate.
These networks are computationally expensive involving billions of arithmetic operations and parameters.
This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z) - FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic
Arrays [2.8583189395674653]
We propose FuSeConv as a drop-in replacement for depth-wise separable convolution.
FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions.
We achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-27T20:19:39Z) - Decoupled Dynamic Filter Networks [85.38058820176047]
We propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings.
Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.
We observe a significant boost in performance when replacing standard convolution with DDF in classification networks.
arXiv Detail & Related papers (2021-04-29T04:55:33Z) - VolumeNet: A Lightweight Parallel Network for Super-Resolution of
Medical Volumetric Data [20.34783243852236]
We propose a 3D convolutional neural network (CNN) for SR of medical volumetric data called ParallelNet using parallel connections.
We show that the proposed VolumeNet significantly reduces the number of model parameters and achieves high precision results.
arXiv Detail & Related papers (2020-10-16T12:53:15Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z) - Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision
Applications [13.197955183748796]
A novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN.
A low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.
arXiv Detail & Related papers (2020-03-05T06:20:09Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.