A Simple Approach to Image Tilt Correction with Self-Attention MobileNet
for Smartphones
- URL: http://arxiv.org/abs/2111.00398v1
- Date: Sun, 31 Oct 2021 03:41:46 GMT
- Title: A Simple Approach to Image Tilt Correction with Self-Attention MobileNet
for Smartphones
- Authors: Siddhant Garg, Debi Prasanna Mohanty, Siva Prasad Thota, Sukumar
Moharana
- Abstract summary: We present a Self-Attention MobileNet that can model long-range dependencies between the image features instead of processing the local region.
We also propose a novel training pipeline for the task of image tilt detection.
We present state-of-the-art results on detecting the image tilt angle on mobile devices as compared to the MobileNetV3 model.
- Score: 4.989480853499916
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The main contributions of our work are two-fold. First, we present a
Self-Attention MobileNet, called SA-MobileNet Network that can model long-range
dependencies between the image features instead of processing the local region
as done by standard convolutional kernels. SA-MobileNet contains self-attention
modules integrated with the inverted bottleneck blocks of the MobileNetV3 model
which results in modeling of both channel-wise attention and spatial attention
of the image features and at the same time introduce a novel self-attention
architecture for low-resource devices. Secondly, we propose a novel training
pipeline for the task of image tilt detection. We treat this problem in a
multi-label scenario where we predict multiple angles for a tilted input image
in a narrow interval of range 1-2 degrees, depending on the dataset used. This
process induces an implicit correlation between labels without any
computational overhead of the second or higher-order methods in multi-label
learning. With the combination of our novel approach and the architecture, we
present state-of-the-art results on detecting the image tilt angle on mobile
devices as compared to the MobileNetV3 model. Finally, we establish that
SA-MobileNet is more accurate than MobileNetV3 on SUN397, NYU-V1, and ADE20K
datasets by 6.42%, 10.51%, and 9.09% points respectively, and faster by at
least 4 milliseconds on Snapdragon 750 Octa-core.
Related papers
- Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision
Models [40.40784209977589]
This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention.
We replace a standard Transformer block with a mobile convolution block, and further reorder it before the self-attention operation.
Our conceptually simple MOAT networks are surprisingly effective, achieving 89.1% top-1 accuracy on ImageNet-1K with ImageNet-22K pretraining.
arXiv Detail & Related papers (2022-10-04T18:00:06Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - MobileOne: An Improved One millisecond Mobile Backbone [14.041480018494394]
We analyze different metrics by deploying several mobile-friendly networks on a mobile device.
We design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12.
We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile.
arXiv Detail & Related papers (2022-06-08T17:55:11Z) - Separable Self-attention for Mobile Vision Transformers [34.32399598443582]
This paper introduces a separable self-attention method with linear complexity, i.e. $O(k)$.
The improved model, MobileViTv2, is state-of-the-art on several mobile vision tasks, including ImageNet object classification and MS-COCO object detection.
arXiv Detail & Related papers (2022-06-06T15:31:35Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Robust 2D/3D Vehicle Parsing in CVIS [54.825777404511605]
We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS)
Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters.
In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation.
arXiv Detail & Related papers (2021-03-11T03:35:05Z) - Detecting soccer balls with reduced neural networks: a comparison of
multiple architectures under constrained hardware scenarios [0.8808021343665321]
This work provides a comparative study of recent proposals of neural networks targeted towards constrained hardware environments.
We train multiple open implementations of MobileNetV2 and MobileNetV3 models with different underlying architectures.
Results show that MobileNetV3 models have a good trade-off between mAP and inference time in constrained scenarios only, while MobileNetV2 with high width multipliers are appropriate for server-side inference.
arXiv Detail & Related papers (2020-09-28T23:26:25Z) - ULSAM: Ultra-Lightweight Subspace Attention Module for Compact
Convolutional Neural Networks [4.143032261649983]
"Ultra-Lightweight Subspace Attention Mechanism" (ULSAM) is end-to-end trainable and can be deployed as a plug-and-play module in compact convolutional neural networks (CNNs)
We achieve $approx$13% and $approx$25% reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27% and more than 1% improvement in top-1 accuracy on the ImageNet-1K and fine-grained image classification datasets (respectively)
arXiv Detail & Related papers (2020-06-26T17:05:43Z) - MobileDets: Searching for Object Detection Architectures for Mobile
Accelerators [61.30355783955777]
Inverted bottleneck layers have been the predominant building blocks in state-of-the-art object detection models on mobile devices.
Regular convolutions are a potent component to boost the latency-accuracy trade-off for object detection on accelerators.
We obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
arXiv Detail & Related papers (2020-04-30T00:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.