Related papers: YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

URL: http://arxiv.org/abs/2009.05697v2
Date: Wed, 30 Dec 2020 15:55:43 GMT
Title: YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design
Authors: Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang
Abstract summary: We propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20.
Score: 38.98949683262209
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}.

Related papers

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO [10.183459286746196]
We introduce YOLO Phantom, one of the smallest YOLO models ever conceived. YOLO Phantom achieves comparable accuracy to the latest YOLOv8n model while simultaneously reducing both parameters and model size. Its real-world efficacy is demonstrated on an IoT platform with advanced low-light and RGB cameras, seamlessly connecting to an AWS-based notification endpoint.
arXiv Detail & Related papers (2024-02-12T18:56:53Z)
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets. Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z)
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications [98.90623605283564]
We introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications. We build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Our small variant achieves 78.5% top-1 ImageNet-1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2x faster compared to MobileViT-v2.
arXiv Detail & Related papers (2023-03-27T17:59:58Z)
EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z)
PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices [13.62426382827205]
PP-PicoDet family of real-time object detectors achieves superior performance on object detection for mobile devices. Models achieve better trade-offs between accuracy and latency compared to other popular models.
arXiv Detail & Related papers (2021-11-01T12:53:17Z)
YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs [14.85882314822983]
In order to map deep neural network (DNN) based object detection models to edge devices, one typically needs to compress such models significantly. In this paper, we propose a novel edge GPU friendly module for multi-scale feature interaction. We also propose a novel learning backbone adoption inspired by the changing translational information flow across various tasks.
arXiv Detail & Related papers (2021-10-26T14:02:59Z)
Developing a Compressed Object Detection Model based on YOLOv4 for Deployment on Embedded GPU Platform of Autonomous System [0.0]
CNN-based object detection models are quite accurate but require a high-performance GPU to run in real-time. It is preferable to compress the detection network as light as possible while preserving the detection accuracy. This paper proposes a new object detection model, referred as YOffleNet, which is compressed at a high ratio.
arXiv Detail & Related papers (2021-08-01T08:19:51Z)
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
MobileDets: Searching for Object Detection Architectures for Mobile Accelerators [61.30355783955777]
Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices. Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators. We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
arXiv Detail & Related papers (2020-04-30T00:21:30Z)
DeepSperm: A robust and real-time bull sperm-cell detection in densely populated semen videos [26.494850349599528]
This study proposes an architecture, called DeepSperm, that solves the challenges and is more accurate and faster than state-of-the-art architectures. In our experiment, we achieve 86.91 mAP on the test dataset and a processing speed of 50.3 fps.
arXiv Detail & Related papers (2020-03-03T09:05:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.