Related papers: PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

URL: http://arxiv.org/abs/2111.00902v1
Date: Mon, 1 Nov 2021 12:53:17 GMT
Title: PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices
Authors: Guanghua Yu, Qinyao Chang, Wenyu Lv, Chang Xu, Cheng Cui, Wei Ji, Qingqing Dang, Kaipeng Deng, Guanzhong Wang, Yuning Du, Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma
Abstract summary: PP-PicoDet family of real-time object detectors achieves superior performance on object detection for mobile devices. Models achieve better trade-offs between accuracy and latency compared to other popular models.
Score: 13.62426382827205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The better accuracy and efficiency trade-off has been a challenging problem in object detection. In this work, we are dedicated to studying key optimizations and neural network architecture choices for object detection to improve accuracy and efficiency. We investigate the applicability of the anchor-free strategy on lightweight object detection models. We enhance the backbone structure and design the lightweight structure of the neck, which improves the feature extraction ability of the network. We improve label assignment strategy and loss function to make training more stable and efficient. Through these optimizations, we create a new family of real-time object detectors, named PP-PicoDet, which achieves superior performance on object detection for mobile devices. Our models achieve better trade-offs between accuracy and latency compared to other popular models. PicoDet-S with only 0.99M parameters achieves 30.6% mAP, which is an absolute 4.8% improvement in mAP while reducing mobile CPU inference latency by 55% compared to YOLOX-Nano, and is an absolute 7.1% improvement in mAP compared to NanoDet. It reaches 123 FPS (150 FPS using Paddle Lite) on mobile ARM CPU when the input size is 320. PicoDet-L with only 3.3M parameters achieves 40.9% mAP, which is an absolute 3.7% improvement in mAP and 44% faster than YOLOv5s. As shown in Figure 1, our models far outperform the state-of-the-art results for lightweight object detection. Code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.

Related papers

HGO-YOLO: Advancing Anomaly Behavior Detection with Hierarchical Features and Lightweight Optimized Detection [0.0]
This study proposes a model called HGO-YOLO, which integrates the HGNetv2 architecture into YOLOv8. Evaluation results show that the proposed algorithm achieves a mAP@0.5 of 87.4% and a recall rate of 81.1%, with a model size of only 4.6 MB and a frame rate of 56 FPS on the CPU.
arXiv Detail & Related papers (2025-03-10T14:29:12Z)
A Light Perspective for 3D Object Detection [46.23578780480946]
This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process. Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV3. By fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%.
arXiv Detail & Related papers (2025-03-10T10:03:23Z)
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection [0.0]
We focus on design choices of neural network architectures for efficient object detection based on FLOP. We propose several optimizations to enhance the efficiency of YOLO-based models. This paper contributes to a new scaling paradigm for object detection and YOLO-centric models called LeYOLO.
arXiv Detail & Related papers (2024-06-20T12:08:24Z)
YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5 [19.388112026410045]
YOLO-TLA is an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters.
arXiv Detail & Related papers (2024-02-22T05:55:17Z)
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection [20.886999159134138]
Real-time efficient perception is critical for autonomous navigation and city scale sensing. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene. Our approach improves detection rate by +4.1 $AP_S$ or +39% and in real-time performance by +5.3 $sAP_S$ or +63% for small objects over state-of-the-art (SOTA)
arXiv Detail & Related papers (2023-03-25T00:43:44Z)
EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z)
Fewer is More: Efficient Object Detection in Large Aerial Images [59.683235514193505]
This paper presents an Objectness Activation Network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets. We extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively.
arXiv Detail & Related papers (2022-12-26T12:49:47Z)
ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. We build a unified framework for efficient end-to-end temporal action detection (ETAD) ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z)
EAutoDet: Efficient Architecture Search for Object Detection [110.99532343155073]
EAutoDet framework can discover practical backbone and FPN architectures for object detection in 1.4 GPU-days. We propose a kernel reusing technique by sharing the weights of candidate operations on one edge and consolidating them into one convolution. In particular, the discovered architectures surpass state-of-the-art object detection NAS methods and achieve 40.1 mAP with 120 FPS and 49.2 mAP with 41.3 FPS on COCO test-dev set.
arXiv Detail & Related papers (2022-03-21T05:56:12Z)
YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs [14.85882314822983]
In order to map deep neural network (DNN) based object detection models to edge devices, one typically needs to compress such models significantly. In this paper, we propose a novel edge GPU friendly module for multi-scale feature interaction. We also propose a novel learning backbone adoption inspired by the changing translational information flow across various tasks.
arXiv Detail & Related papers (2021-10-26T14:02:59Z)
Small Object Detection Based on Modified FSSD and Model Compression [7.387639662781843]
This paper proposes a small object detection algorithm based on FSSD. In order to reduce the computational cost and storage space, pruning is carried out to achieve model compression. The average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti.
arXiv Detail & Related papers (2021-08-24T03:20:32Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
MobileDets: Searching for Object Detection Architectures for Mobile Accelerators [61.30355783955777]
Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices. Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators. We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
arXiv Detail & Related papers (2020-04-30T00:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.