AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning
- URL: http://arxiv.org/abs/2303.01589v1
- Date: Thu, 2 Mar 2023 21:24:19 GMT
- Title: AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning
- Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Celso M. de Melo, Stephen M.
Nogar, Aniket Bera, Dinesh Manocha
- Abstract summary: We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
- Score: 63.628195002143734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach for aerial video action recognition. Our method
is designed for videos captured using UAVs and can run on edge or mobile
devices. We present a learning-based approach that uses customized auto zoom to
automatically identify the human target and scale it appropriately. This makes
it easier to extract the key features and reduces the computational overhead.
We also present an efficient temporal reasoning algorithm to capture the action
information along the spatial and temporal domains within a controllable
computational cost. Our approach has been implemented and evaluated both on the
desktop with high-end GPUs and on the low power Robotics RB5 Platform for
robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in
Top-1 accuracy on the RoCoG-v2 dataset, 8.3-10.4% improvement on the UAV-Human
dataset and 3.2% improvement on the Drone Action dataset.
Related papers
- VECTOR: Velocity-Enhanced GRU Neural Network for Real-Time 3D UAV Trajectory Prediction [2.1825723033513165]
We propose a new trajectory prediction method using Gated Recurrent Units (GRUs) within sequence-based neural networks.
We employ both synthetic and real-world 3D UAV trajectory data, capturing a wide range of flight patterns, speeds, and agility.
The GRU-based models significantly outperform state-of-the-art RNN approaches, with a mean square error (MSE) as low as 2 x 10-8.
arXiv Detail & Related papers (2024-10-24T07:16:42Z) - SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining [65.9024395309316]
We introduce a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs)
We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and downstream action recognition performance.
arXiv Detail & Related papers (2024-09-26T21:15:22Z) - Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring [4.303063757163241]
This study recognizes the imperative for real-time data processing in disaster response scenarios and introduces a lightweight and efficient approach for aerial video understanding.
Our methodology identifies redundant portions within the video through policy networks and eliminates this excess information using frame compression techniques.
Compared to the baseline, our approach reduces computation costs by more than 13 times while boosting accuracy by 3$%$.
arXiv Detail & Related papers (2024-08-31T17:26:53Z) - TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception.
Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
for Aerial Video Action Recognition [59.905048445296906]
We present a novel approach for action recognition in UAV videos.
We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain.
In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods.
arXiv Detail & Related papers (2023-03-05T04:05:17Z) - TransVisDrone: Spatio-Temporal Transformer for Vision-based
Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.
Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices.
We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.