Related papers: CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

URL: http://arxiv.org/abs/2006.08357v2
Date: Mon, 25 Jan 2021 22:35:57 GMT
Title: CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs
Authors: Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek
Abstract summary: We harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB. Our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
Score: 41.43273142203345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. However, this will lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with the modified deformable convolution and quantize it to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.

Related papers

Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats [0.5420492913071214]
We present the implementation of four FPGA-accelerated convolutional neural network (CNN) models for onboard cloud detection in resource-constrained CubeSat missions. This study explores both pixel-wise (Pixel-Net and Patch-Net) and image-wise (U-Net and Scene-Net) models to benchmark trade-offs in accuracy, latency, and model complexity. All models retained high accuracy post-FPGA integration, with a cumulative maximum accuracy drop of only 0.6% after quantization and pruning.
arXiv Detail & Related papers (2025-04-04T19:32:47Z)
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images [2.9138705529771123]
We present a novel enhancement to the YOLOv8 model, tailored for oriented object detection tasks. Our model features a wavelet transform-based C2f module for capturing associative features and an Adaptive Scale Feature Pyramid (ASFP) module that leverages P2 layer details. Our approach provides a more efficient architectural design than DecoupleNet, which has 23.3M parameters, all while maintaining detection accuracy.
arXiv Detail & Related papers (2024-12-17T05:45:48Z)
Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z)
Efficient Context Integration through Factorized Pyramidal Learning for Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner. We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect. Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z)
EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z)
Head-Free Lightweight Semantic Segmentation with Linear Transformer [21.38163906180886]
We propose a head-free lightweight architecture specifically for semantic segmentation, named Adaptive Frequency Transformer. It adopts a parallel architecture to leverage prototype representations as specific learnable local descriptions which replaces the decoder. Although removing the decoder compresses most of the computation, the accuracy of the parallel structure is still limited by low computational resources.
arXiv Detail & Related papers (2023-01-11T18:59:46Z)
Optimizing Anchor-based Detectors for Autonomous Driving Scenes [22.946814647030667]
This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in autonomous driving scenes. Based on the high-performing RCNN-RS and RetinaNet-RS detection frameworks, we study a set of framework improvements to adapt the detectors to better detect small objects in crowd scenes.
arXiv Detail & Related papers (2022-08-11T22:44:59Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches. We propose a decoder-free fully transformer-based (DFFT) object detector. DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z)
Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for 5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity. One of the main challenges comes from the real-time implementation of these algorithms. This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z)
Small Object Detection Based on Modified FSSD and Model Compression [7.387639662781843]
This paper proposes a small object detection algorithm based on FSSD. In order to reduce the computational cost and storage space, pruning is carried out to achieve model compression. The average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti.
arXiv Detail & Related papers (2021-08-24T03:20:32Z)
Algorithm-hardware Co-design for Deformable Convolution [40.50544352625659]
We build an efficient object detection network with modified deformable convolutions and quantize the network using state-of-the-art quantization methods. Preliminary experiments show that little accuracy is compromised and speedup can be achieved with our co-design optimization for the deformable convolution.
arXiv Detail & Related papers (2020-02-19T01:08:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.