AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning
- URL: http://arxiv.org/abs/2303.01589v1
- Date: Thu, 2 Mar 2023 21:24:19 GMT
- Title: AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning
- Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Celso M. de Melo, Stephen M.
Nogar, Aniket Bera, Dinesh Manocha
- Abstract summary: We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
- Score: 63.628195002143734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach for aerial video action recognition. Our method
is designed for videos captured using UAVs and can run on edge or mobile
devices. We present a learning-based approach that uses customized auto zoom to
automatically identify the human target and scale it appropriately. This makes
it easier to extract the key features and reduces the computational overhead.
We also present an efficient temporal reasoning algorithm to capture the action
information along the spatial and temporal domains within a controllable
computational cost. Our approach has been implemented and evaluated both on the
desktop with high-end GPUs and on the low power Robotics RB5 Platform for
robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in
Top-1 accuracy on the RoCoG-v2 dataset, 8.3-10.4% improvement on the UAV-Human
dataset and 3.2% improvement on the Drone Action dataset.
Related papers
- TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for un- manned aerial vehicle (UAV)-based perception.
Our formu- lation is designed for dynamic scenes, consisting of moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
for Aerial Video Action Recognition [59.905048445296906]
We present a novel approach for action recognition in UAV videos.
We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain.
In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods.
arXiv Detail & Related papers (2023-03-05T04:05:17Z) - TransVisDrone: Spatio-Temporal Transformer for Vision-based
Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.
Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices.
We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today.
We propose a novel deep learning algorithm to automate this aerial person detection (APD) task.
We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z) - Learning in the Sky: An Efficient 3D Placement of UAVs [0.8399688944263842]
We propose a learning-based mechanism for the three-dimensional deployment of UAVs assisting terrestrial cellular networks in the downlink.
The problem is modeled as a non-cooperative game among UAVs in satisfaction form.
To solve the game, we utilize a low complexity algorithm, in which unsatisfied UAVs update their locations based on a learning algorithm.
arXiv Detail & Related papers (2020-03-02T15:16:00Z) - MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale
Robotic Navigation [23.54696982881734]
We propose a novel motion and visual perception approach, dubbed MVP, for large-scale, target-driven navigation tasks.
Our MVP-based method can learn faster, and is more accurate and robust to both extreme environmental changes and poor GPS data.
We evaluate our method on two large real-world datasets, Oxford Robotcar and Nordland Railway.
arXiv Detail & Related papers (2020-03-02T05:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.