Related papers: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

URL: http://arxiv.org/abs/2110.13713v1
Date: Tue, 26 Oct 2021 14:02:59 GMT
Title: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs
Authors: Prakhar Ganesh, Yao Chen, Yin Yang, Deming Chen, Marianne Winslett
Abstract summary: In order to map deep neural network (DNN) based object detection models to edge devices, one typically needs to compress such models significantly. In this paper, we propose a novel edge GPU friendly module for multi-scale feature interaction. We also propose a novel learning backbone adoption inspired by the changing translational information flow across various tasks.
Score: 14.85882314822983
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Performance of object detection models has been growing rapidly on two major fronts, model accuracy and efficiency. However, in order to map deep neural network (DNN) based object detection models to edge devices, one typically needs to compress such models significantly, thus compromising the model accuracy. In this paper, we propose a novel edge GPU friendly module for multi-scale feature interaction by exploiting missing combinatorial connections between various feature scales in existing state-of-the-art methods. Additionally, we propose a novel transfer learning backbone adoption inspired by the changing translational information flow across various tasks, designed to complement our feature interaction module and together improve both accuracy as well as execution speed on various edge GPU devices available in the market. For instance, YOLO-ReT with MobileNetV2x0.75 backbone runs real-time on Jetson Nano, and achieves 68.75 mAP on Pascal VOC and 34.91 mAP on COCO, beating its peers by 3.05 mAP and 0.91 mAP respectively, while executing faster by 3.05 FPS. Furthermore, introducing our multi-scale feature interaction module in YOLOv4-tiny and YOLOv4-tiny (3l) improves their performance to 41.5 and 48.1 mAP respectively on COCO, outperforming the original versions by 1.3 and 0.9 mAP.

Related papers

A Light Perspective for 3D Object Detection [46.23578780480946]
This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process. Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV3. By fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%.
arXiv Detail & Related papers (2025-03-10T10:03:23Z)
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection [0.0]
We focus on design choices of neural network architectures for efficient object detection based on FLOP. We propose several optimizations to enhance the efficiency of YOLO-based models. This paper contributes to a new scaling paradigm for object detection and YOLO-centric models called LeYOLO.
arXiv Detail & Related papers (2024-06-20T12:08:24Z)
YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5 [19.388112026410045]
YOLO-TLA is an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters.
arXiv Detail & Related papers (2024-02-22T05:55:17Z)
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets. Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z)
Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection [20.161887223481994]
We propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. StreamPETR achieves significant performance improvements only with negligible cost, compared to the single-frame baseline. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8x faster FPS.
arXiv Detail & Related papers (2023-03-21T15:19:20Z)
EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z)
CNN-transformer mixed model for object detection [3.5897534810405403]
In this paper, I propose a convolutional module with a transformer. It aims to improve the recognition accuracy of the model by fusing the detailed features extracted by CNN with the global features extracted by a transformer. After 100 rounds of training on the Pascal VOC dataset, the accuracy of the results reached 81%, which is 4.6 better than the faster RCNN[4] using resnet101[5] as the backbone.
arXiv Detail & Related papers (2022-12-13T16:35:35Z)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K. Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices [13.62426382827205]
PP-PicoDet family of real-time object detectors achieves superior performance on object detection for mobile devices. Models achieve better trade-offs between accuracy and latency compared to other popular models.
arXiv Detail & Related papers (2021-11-01T12:53:17Z)
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
MobileDets: Searching for Object Detection Architectures for Mobile Accelerators [61.30355783955777]
Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices. Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators. We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
arXiv Detail & Related papers (2020-04-30T00:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.