CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on
Embedded FPGAs
- URL: http://arxiv.org/abs/2006.08357v2
- Date: Mon, 25 Jan 2021 22:35:57 GMT
- Title: CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on
Embedded FPGAs
- Authors: Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li,
Bichen Wu, Kurt Keutzer, John Wawrzynek
- Abstract summary: We harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions.
With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB.
Our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
- Score: 41.43273142203345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deploying deep learning models on embedded systems has been challenging due
to limited computing resources. The majority of existing work focuses on
accelerating image classification, while other fundamental vision problems,
such as object detection, have not been adequately addressed. Compared with
image classification, detection problems are more sensitive to the spatial
variance of objects, and therefore, require specialized convolutions to
aggregate spatial information. To address this need, recent work introduces
dynamic deformable convolution to augment regular convolutions. However, this
will lead to inefficient memory accesses of inputs with existing hardware. In
this work, we harness the flexibility of FPGAs to develop a novel object
detection pipeline with deformable convolutions. We show the speed-accuracy
tradeoffs for a set of algorithm modifications including irregular-access
versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with
the modified deformable convolution and quantize it to 4-bit weights and 8-bit
activations. With our high-efficiency implementation, our solution reaches 26.9
frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50
on the standard object detection dataset, Pascal VOC. With our higher accuracy
implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of
parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
Related papers
- Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework.
We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects.
Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z) - Head-Free Lightweight Semantic Segmentation with Linear Transformer [21.38163906180886]
We propose a head-free lightweight architecture specifically for semantic segmentation, named Adaptive Frequency Transformer.
It adopts a parallel architecture to leverage prototype representations as specific learnable local descriptions which replaces the decoder.
Although removing the decoder compresses most of the computation, the accuracy of the parallel structure is still limited by low computational resources.
arXiv Detail & Related papers (2023-01-11T18:59:46Z) - Optimizing Anchor-based Detectors for Autonomous Driving Scenes [22.946814647030667]
This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in autonomous driving scenes.
Based on the high-performing RCNN-RS and RetinaNet-RS detection frameworks, we study a set of framework improvements to adapt the detectors to better detect small objects in crowd scenes.
arXiv Detail & Related papers (2022-08-11T22:44:59Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for
5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity.
One of the main challenges comes from the real-time implementation of these algorithms.
This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z) - Small Object Detection Based on Modified FSSD and Model Compression [7.387639662781843]
This paper proposes a small object detection algorithm based on FSSD.
In order to reduce the computational cost and storage space, pruning is carried out to achieve model compression.
The average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti.
arXiv Detail & Related papers (2021-08-24T03:20:32Z) - Algorithm-hardware Co-design for Deformable Convolution [40.50544352625659]
We build an efficient object detection network with modified deformable convolutions and quantize the network using state-of-the-art quantization methods.
Preliminary experiments show that little accuracy is compromised and speedup can be achieved with our co-design optimization for the deformable convolution.
arXiv Detail & Related papers (2020-02-19T01:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.