Algorithm-hardware Co-design for Deformable Convolution
- URL: http://arxiv.org/abs/2002.08357v1
- Date: Wed, 19 Feb 2020 01:08:11 GMT
- Title: Algorithm-hardware Co-design for Deformable Convolution
- Authors: Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen
Wu, Kurt Keutzer, John Wawrzynek
- Abstract summary: We build an efficient object detection network with modified deformable convolutions and quantize the network using state-of-the-art quantization methods.
Preliminary experiments show that little accuracy is compromised and speedup can be achieved with our co-design optimization for the deformable convolution.
- Score: 40.50544352625659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: FPGAs provide a flexible and efficient platform to accelerate
rapidly-changing algorithms for computer vision. The majority of existing work
focuses on accelerating image classification, while other fundamental vision
problems, including object detection and instance segmentation, have not been
adequately addressed. Compared with image classification, detection problems
are more sensitive to the spatial variance of objects, and therefore, require
specialized convolutions to aggregate spatial information. To address this,
recent work proposes dynamic deformable convolution to augment regular
convolutions. Regular convolutions process a fixed grid of pixels across all
the spatial locations in an image, while dynamic deformable convolutions may
access arbitrary pixels in the image and the access pattern is input-dependent
and varies per spatial location. These properties lead to inefficient memory
accesses of inputs with existing hardware. In this work, we first investigate
the overhead of the deformable convolution on embedded FPGA SoCs, and then show
the accuracy-latency tradeoffs for a set of algorithm modifications including
full versus depthwise, fixed-shape, and limited-range. These modifications
benefit the energy efficiency for embedded devices in general as they reduce
the compute complexity. We then build an efficient object detection network
with modified deformable convolutions and quantize the network using
state-of-the-art quantization methods. We implement a unified hardware engine
on FPGA to support all the operations in the network. Preliminary experiments
show that little accuracy is compromised and speedup can be achieved with our
co-design optimization for the deformable convolution.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Quantum-Inspired Edge Detection Algorithms Implementation using New
Dynamic Visual Data Representation and Short-Length Convolution Computation [6.950510860295866]
This paper studies a new paired transform-based quantum representation and computation of one-dimensional and 2-D signals convolutions and gradients.
The new data representation is demonstrated on multiple illustrative examples for quantum edge detection, gradients, and convolution.
arXiv Detail & Related papers (2022-10-31T17:13:27Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Adaptive Convolutions with Per-pixel Dynamic Filter Atom [24.691793951360914]
We introduce scalable dynamic convolutions with per-pixel adapted filters.
As plug-and-play replacements to convolutional layers, the introduced adaptive convolutions with per-pixel dynamic atoms enable explicit modeling of intra-image variance.
We present experiments to show that, the proposed method delivers comparable or even better performance across tasks.
arXiv Detail & Related papers (2021-08-17T22:04:10Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on
Embedded FPGAs [41.43273142203345]
We harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions.
With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB.
Our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
arXiv Detail & Related papers (2020-06-12T17:56:47Z) - Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion
Deblurring [39.92889091819711]
We propose an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations.
We use a patch-hierarchical attentive architecture composed of the above module that implicitly discovers the spatial variations in the blur present in the input image.
Our design offers significant improvements over the state-of-the-art in accuracy as well as speed.
arXiv Detail & Related papers (2020-04-11T09:24:00Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.