SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
- URL: http://arxiv.org/abs/2409.18300v1
- Date: Thu, 26 Sep 2024 21:15:22 GMT
- Title: SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
- Authors: Ruiqi Xian, Xiyang Wu, Tianrui Guan, Xijun Wang, Boqing Gong, Dinesh Manocha,
- Abstract summary: We introduce a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs)
We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance.
- Score: 65.9024395309316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce SOAR, a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs). We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance. This is in contrast to prior works that primarily incorporate object information during the fine-tuning stage. Specifically, we first propose a novel object-aware masking strategy designed to retain the visibility of certain patches related to objects throughout the pretraining phase. Second, we introduce an object-aware loss function that utilizes object information to adjust the reconstruction loss, preventing bias towards less informative background patches. In practice, SOAR with a vanilla ViT backbone, outperforms best UAV action recognition models, recording a 9.7% and 21.4% boost in top-1 accuracy on the NEC-Drone and UAV-Human datasets, while delivering an inference speed of 18.7ms per video, making it 2x to 5x faster. Additionally, SOAR obtains comparable accuracy to prior self-supervised learning (SSL) methods while requiring 87.5% less pretraining time and 25% less memory usage
Related papers
- Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring [4.303063757163241]
This study recognizes the imperative for real-time data processing in disaster response scenarios and introduces a lightweight and efficient approach for aerial video understanding.
Our methodology identifies redundant portions within the video through policy networks and eliminates this excess information using frame compression techniques.
Compared to the baseline, our approach reduces computation costs by more than 13 times while boosting accuracy by 3$%$.
arXiv Detail & Related papers (2024-08-31T17:26:53Z) - ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts [52.1635661239108]
We introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts.
Our experiments demonstrate state-of-the-art results on satellite imagery, even outperforming fully pre-training and fine-tuning ViTs.
arXiv Detail & Related papers (2024-06-16T15:14:56Z) - Reward Finetuning for Faster and More Accurate Unsupervised Object
Discovery [64.41455104593304]
Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences.
We propose to adapt similar RL-based methods to unsupervised object discovery.
We demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train.
arXiv Detail & Related papers (2023-10-29T17:03:12Z) - MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
for Aerial Video Action Recognition [59.905048445296906]
We present a novel approach for action recognition in UAV videos.
We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain.
In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods.
arXiv Detail & Related papers (2023-03-05T04:05:17Z) - AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z) - Small Object Detection using Deep Learning [0.28675177318965034]
The proposed system consists of a custom deep learning model Tiny YOLOv3, one of the flavors of very fast object detection model You Look Only Once (YOLO) is built and used for detection.
The proposed architecture has shown significantly better performance as compared to the previous YOLO version.
arXiv Detail & Related papers (2022-01-10T09:58:25Z) - 3D UAV Trajectory and Data Collection Optimisation via Deep
Reinforcement Learning [75.78929539923749]
Unmanned aerial vehicles (UAVs) are now beginning to be deployed for enhancing the network performance and coverage in wireless communication.
It is challenging to obtain an optimal resource allocation scheme for the UAV-assisted Internet of Things (IoT)
In this paper, we design a new UAV-assisted IoT systems relying on the shortest flight path of the UAVs while maximising the amount of data collected from IoT devices.
arXiv Detail & Related papers (2021-06-06T14:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.