Related papers: Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

URL: http://arxiv.org/abs/2104.14125v1
Date: Thu, 29 Apr 2021 05:45:16 GMT
Title: Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions
Authors: Tse-Wei Chen, Wei Tao, Deyu Wang, Dongchao Wen, Kinya Osa, Masami Kato
Abstract summary: The proposed architecture can support filter kernels with different sizes with high flexibility. For image classification, the accuracy is increased by 1% by simply replacing $3 times 3$ filters with $5 times 5$ filters in depthwise convolutions.
Score: 27.141754658998323
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In order to handle modern convolutional neural networks (CNNs) efficiently, a hardware architecture of CNN inference accelerator is proposed to handle depthwise convolutions and regular convolutions, which are both essential building blocks for embedded-computer-vision algorithms. Different from related works, the proposed architecture can support filter kernels with different sizes with high flexibility since it does not require extra costs for intra-kernel parallelism, and it can generate convolution results faster than the architecture of the related works. The experimental results show the importance of supporting depthwise convolutions and dilated convolutions with the proposed hardware architecture. In addition to depthwise convolutions with large-kernels, a new structure called DDC layer, which includes the combination of depthwise convolutions and dilated convolutions, is also analyzed in this paper. For face detection, the computational costs decrease by 30%, and the model size decreases by 20% when the DDC layers are applied to the network. For image classification, the accuracy is increased by 1% by simply replacing $3 \times 3$ filters with $5 \times 5$ filters in depthwise convolutions.

Related papers

ApproxDARTS: Differentiable Neural Architecture Search with Approximate Multipliers [0.24578723416255746]
We present ApproxDARTS, a neural architecture search (NAS) method enabling the popular differentiable neural architecture search method called DARTS to exploit approximate multipliers. We show that the ApproxDARTS is able to perform a complete architecture search within less than $10$ GPU hours and produce competitive convolutional neural networks (CNN) containing approximate multipliers in convolutional layers.
arXiv Detail & Related papers (2024-04-08T09:54:57Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Does Form Follow Function? An Empirical Exploration of the Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration [76.35307867016336]
This study investigates the impact of deep neural network architecture design on the degree of inference speedup. We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
arXiv Detail & Related papers (2021-07-08T23:05:39Z)
Multi-objective Evolutionary Approach for Efficient Kernel Size and Shape for CNN [12.697368516837718]
State-of-the-art development in CNN topology, such as VGGNet and ResNet, have become increasingly accurate. These networks are computationally expensive involving billions of arithmetic operations and parameters. This paper considers optimising the computational resource consumption by reducing the size and number of kernels in convolutional layers.
arXiv Detail & Related papers (2021-06-28T14:47:29Z)
FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic Arrays [2.8583189395674653]
We propose FuSeConv as a drop-in replacement for depth-wise separable convolution. FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions. We achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-27T20:19:39Z)
Decoupled Dynamic Filter Networks [85.38058820176047]
We propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings. Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters. We observe a significant boost in performance when replacing standard convolution with DDF in classification networks.
arXiv Detail & Related papers (2021-04-29T04:55:33Z)
VolumeNet: A Lightweight Parallel Network for Super-Resolution of Medical Volumetric Data [20.34783243852236]
We propose a 3D convolutional neural network (CNN) for SR of medical volumetric data called ParallelNet using parallel connections. We show that the proposed VolumeNet significantly reduces the number of model parameters and achieves high precision results.
arXiv Detail & Related papers (2020-10-16T12:53:15Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications [13.197955183748796]
A novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN. A low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.
arXiv Detail & Related papers (2020-03-05T06:20:09Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.