Rotation Invariant Transformer for Recognizing Object in UAVs
- URL: http://arxiv.org/abs/2311.02559v1
- Date: Sun, 5 Nov 2023 03:55:08 GMT
- Title: Rotation Invariant Transformer for Recognizing Object in UAVs
- Authors: Shuoyi Chen, Mang Ye, Bo Du
- Abstract summary: We propose a novel rotation invariant vision transformer (RotTrans) forRecognizing targets of interest from UAVs.
RotTrans greatly outperforms the current state-of-the-arts, which is 5.9% and 4.8% higher than the highest mAP and Rank1.
Our solution wins the first place in the UAV-based person re-recognition track in the Multi-Modal Video Reasoning and Analyzing Competition.
- Score: 66.1564328237299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing a target of interest from the UAVs is much more challenging than
the existing object re-identification tasks across multiple city cameras. The
images taken by the UAVs usually suffer from significant size difference when
generating the object bounding boxes and uncertain rotation variations.
Existing methods are usually designed for city cameras, incapable of handing
the rotation issue in UAV scenarios. A straightforward solution is to perform
the image-level rotation augmentation, but it would cause loss of useful
information when inputting the powerful vision transformer as patches. This
motivates us to simulate the rotation operation at the patch feature level,
proposing a novel rotation invariant vision transformer (RotTrans). This
strategy builds on high-level features with the help of the specificity of the
vision transformer structure, which enhances the robustness against large
rotation differences. In addition, we design invariance constraint to establish
the relationship between the original feature and the rotated features,
achieving stronger rotation invariance. Our proposed transformer tested on the
latest UAV datasets greatly outperforms the current state-of-the-arts, which is
5.9\% and 4.8\% higher than the highest mAP and Rank1. Notably, our model also
performs competitively for the person re-identification task on traditional
city cameras. In particular, our solution wins the first place in the UAV-based
person re-recognition track in the Multi-Modal Video Reasoning and Analyzing
Competition held in ICCV 2021. Code is available at
https://github.com/whucsy/RotTrans.
Related papers
- Tiny Multi-Agent DRL for Twins Migration in UAV Metaverses: A Multi-Leader Multi-Follower Stackelberg Game Approach [57.15309977293297]
The synergy between Unmanned Aerial Vehicles (UAVs) and metaverses is giving rise to an emerging paradigm named UAV metaverses.
We propose a tiny machine learning-based Stackelberg game framework based on pruning techniques for efficient UT migration in UAV metaverses.
arXiv Detail & Related papers (2024-01-18T02:14:13Z) - Attention Deficit is Ordered! Fooling Deformable Vision Transformers
with Collaborative Adversarial Patches [3.4673556247932225]
Deformable vision transformers significantly reduce the complexity of attention modeling.
Recent work has demonstrated adversarial attacks against conventional vision transformers.
We develop new collaborative attacks where a source patch manipulates attention to point to a target patch, which contains the adversarial noise to fool the model.
arXiv Detail & Related papers (2023-11-21T17:55:46Z) - SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking.
The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation.
A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - Transformers in Remote Sensing: A Survey [76.95730131233424]
We are the first to present a systematic review of advances based on transformers in remote sensing.
Our survey covers more than 60 recent transformers-based methods for different remote sensing problems.
We conclude the survey by discussing different challenges and open issues of transformers in remote sensing.
arXiv Detail & Related papers (2022-09-02T17:57:05Z) - Vicinity Vision Transformer [53.43198716947792]
We present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
Our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.
arXiv Detail & Related papers (2022-06-21T17:33:53Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-identification [21.48667873335246]
Recent development in deep learning allows vision-based counter-UAV systems to detect and track UAVs with a single camera.
The coverage of a single camera is limited, necessitating the need for multicamera configurations to match UAVs across cameras.
We propose the first new UAV re-identification data set, UAV-reID, that facilitates the development of machine learning solutions in this emerging area.
arXiv Detail & Related papers (2021-04-13T14:13:09Z) - ReDet: A Rotation-equivariant Detector for Aerial Object Detection [27.419045245853706]
We propose a Rotation-equivariant Detector (ReDet) to address these issues.
We incorporate rotation-equivariant networks into the detector to extract rotation-equivariant features.
Our method can achieve state-of-the-art performance on the task of aerial object detection.
arXiv Detail & Related papers (2021-03-13T15:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.