MobileOne: An Improved One millisecond Mobile Backbone
- URL: http://arxiv.org/abs/2206.04040v2
- Date: Tue, 28 Mar 2023 18:38:47 GMT
- Title: MobileOne: An Improved One millisecond Mobile Backbone
- Authors: Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel
Tuzel and Anurag Ranjan
- Abstract summary: We analyze different metrics by deploying several mobile-friendly networks on a mobile device.
We design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12.
We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile.
- Score: 14.041480018494394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient neural network backbones for mobile devices are often optimized for
metrics such as FLOPs or parameter count. However, these metrics may not
correlate well with latency of the network when deployed on a mobile device.
Therefore, we perform extensive analysis of different metrics by deploying
several mobile-friendly networks on a mobile device. We identify and analyze
architectural and optimization bottlenecks in recent efficient neural networks
and provide ways to mitigate these bottlenecks. To this end, we design an
efficient backbone MobileOne, with variants achieving an inference time under 1
ms on an iPhone12 with 75.9% top-1 accuracy on ImageNet. We show that MobileOne
achieves state-of-the-art performance within the efficient architectures while
being many times faster on mobile. Our best model obtains similar performance
on ImageNet as MobileFormer while being 38x faster. Our model obtains 2.3%
better top-1 accuracy on ImageNet than EfficientNet at similar latency.
Furthermore, we show that our model generalizes to multiple tasks - image
classification, object detection, and semantic segmentation with significant
improvements in latency and accuracy as compared to existing efficient
architectures when deployed on a mobile device. Code and models are available
at https://github.com/apple/ml-mobileone
Related papers
- SwiftFormer: Efficient Additive Attention for Transformer-based
Real-time Mobile Vision Applications [98.90623605283564]
We introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications.
We build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
Our small variant achieves 78.5% top-1 ImageNet-1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2x faster compared to MobileViT-v2.
arXiv Detail & Related papers (2023-03-27T17:59:58Z) - FastViT: A Fast Hybrid Vision Transformer using Structural
Reparameterization [14.707312504365376]
We introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off.
We show that our model is 3.5x faster than CMT, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2023-03-24T17:58:32Z) - Rethinking Vision Transformers for MobileNet Size and Speed [58.01406896628446]
We propose a novel supernet with low latency and high parameter efficiency.
We also introduce a novel fine-grained joint search strategy for transformer models.
This work demonstrate that properly designed and optimized vision transformers can achieve high performance even with MobileNet-level size and speed.
arXiv Detail & Related papers (2022-12-15T18:59:12Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Separable Self-attention for Mobile Vision Transformers [34.32399598443582]
This paper introduces a separable self-attention method with linear complexity, i.e. $O(k)$.
The improved model, MobileViTv2, is state-of-the-art on several mobile vision tasks, including ImageNet object classification and MS-COCO object detection.
arXiv Detail & Related papers (2022-06-06T15:31:35Z) - EfficientFormer: Vision Transformers at MobileNet Speed [43.93223983817965]
Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.
ViT-based models are generally times slower than lightweight convolutional networks.
Recent efforts try to reduce the complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory.
arXiv Detail & Related papers (2022-06-02T17:51:03Z) - TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation [111.8342799044698]
We present a mobile-friendly architecture named textbfToken textbfPyramid Vision Transtextbfformer (textbfTopFormer)
The proposed textbfTopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then injected into the corresponding tokens to augment the representation.
On the ADE20K dataset, TopFormer achieves 5% higher accuracy in mIoU than MobileNetV3 with lower latency on an ARM-based mobile device.
arXiv Detail & Related papers (2022-04-12T04:51:42Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - MobileDets: Searching for Object Detection Architectures for Mobile
Accelerators [61.30355783955777]
Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices.
Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators.
We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
arXiv Detail & Related papers (2020-04-30T00:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.