MobileDets: Searching for Object Detection Architectures for Mobile
Accelerators
- URL: http://arxiv.org/abs/2004.14525v3
- Date: Wed, 31 Mar 2021 01:21:42 GMT
- Title: MobileDets: Searching for Object Detection Architectures for Mobile
Accelerators
- Authors: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender,
Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen
- Abstract summary: Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices.
Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators.
We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
- Score: 61.30355783955777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inverted bottleneck layers, which are built upon depthwise convolutions, have
been the predominant building blocks in state-of-the-art object detection
models on mobile devices. In this work, we investigate the optimality of this
design pattern over a broad range of mobile accelerators by revisiting the
usefulness of regular convolutions. We discover that regular convolutions are a
potent component to boost the latency-accuracy trade-off for object detection
on accelerators, provided that they are placed strategically in the network via
neural architecture search. By incorporating regular convolutions in the search
space and directly optimizing the network architectures for object detection,
we obtain a family of object detection models, MobileDets, that achieve
state-of-the-art results across mobile accelerators. On the COCO object
detection task, MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at
comparable mobile CPU inference latencies. MobileDets also outperform
MobileNetV2+SSDLite by 1.9 mAP on mobile CPUs, 3.7 mAP on Google EdgeTPU, 3.4
mAP on Qualcomm Hexagon DSP and 2.7 mAP on Nvidia Jetson GPU without increasing
latency. Moreover, MobileDets are comparable with the state-of-the-art MnasFPN
on mobile CPUs even without using the feature pyramid, and achieve better mAP
scores on both EdgeTPUs and DSPs with up to 2x speedup. Code and models are
available in the TensorFlow Object Detection API:
https://github.com/tensorflow/models/tree/master/research/object_detection.
Related papers
- PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model
on Mobile Devices [4.784867435788648]
PP-MobileSeg is a semantic segmentation model that achieves state-of-the-art performance on mobile devices.
VIM reduces model latency by only interpolating classes present in the final prediction.
Experiments show that PP-MobileSeg achieves a superior tradeoff between accuracy, model size, and latency compared to other methods.
arXiv Detail & Related papers (2023-04-11T11:43:10Z) - FastPillars: A Deployment-friendly Pillar-based 3D Detector [63.0697065653061]
Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference.
FastPillars delivers state-of-the-art accuracy on Open dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based)
arXiv Detail & Related papers (2023-02-05T12:13:27Z) - Using Detection, Tracking and Prediction in Visual SLAM to Achieve
Real-time Semantic Mapping of Dynamic Scenarios [70.70421502784598]
RDS-SLAM can build semantic maps at object level for dynamic scenarios in real time using only one commonly used Intel Core i7 CPU.
We evaluate RDS-SLAM in TUM RGB-D dataset, and experimental results show that RDS-SLAM can run with 30.3 ms per frame in dynamic scenarios.
arXiv Detail & Related papers (2022-10-10T11:03:32Z) - MobileOne: An Improved One millisecond Mobile Backbone [14.041480018494394]
We analyze different metrics by deploying several mobile-friendly networks on a mobile device.
We design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12.
We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile.
arXiv Detail & Related papers (2022-06-08T17:55:11Z) - PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices [13.62426382827205]
PP-PicoDet family of real-time object detectors achieves superior performance on object detection for mobile devices.
Models achieve better trade-offs between accuracy and latency compared to other popular models.
arXiv Detail & Related papers (2021-11-01T12:53:17Z) - YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs [14.85882314822983]
In order to map deep neural network (DNN) based object detection models to edge devices, one typically needs to compress such models significantly.
In this paper, we propose a novel edge GPU friendly module for multi-scale feature interaction.
We also propose a novel learning backbone adoption inspired by the changing translational information flow across various tasks.
arXiv Detail & Related papers (2021-10-26T14:02:59Z) - Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding.
However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold.
We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z) - Detecting soccer balls with reduced neural networks: a comparison of
multiple architectures under constrained hardware scenarios [0.8808021343665321]
This work provides a comparative study of recent proposals of neural networks targeted towards constrained hardware environments.
We train multiple open implementations of MobileNetV2 and MobileNetV3 models with different underlying architectures.
Results show that MobileNetV3 models have a good trade-off between mAP and inference time in constrained scenarios only, while MobileNetV2 with high width multipliers are appropriate for server-side inference.
arXiv Detail & Related papers (2020-09-28T23:26:25Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.