MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
for Aerial Video Action Recognition
- URL: http://arxiv.org/abs/2303.02575v2
- Date: Wed, 15 Nov 2023 23:42:49 GMT
- Title: MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
for Aerial Video Action Recognition
- Authors: Ruiqi Xian, Xijun Wang, Dinesh Manocha
- Abstract summary: We present a novel approach for action recognition in UAV videos.
We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain.
In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods.
- Score: 59.905048445296906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel approach for action recognition in UAV videos. Our
formulation is designed to handle occlusion and viewpoint changes caused by the
movement of a UAV. We use the concept of mutual information to compute and
align the regions corresponding to human action or motion in the temporal
domain. This enables our recognition model to learn from the key features
associated with the motion. We also propose a novel frame sampling method that
uses joint mutual information to acquire the most informative frame sequence in
UAV videos. We have integrated our approach with X3D and evaluated the
performance on multiple datasets. In practice, we achieve 18.9% improvement in
Top-1 accuracy over current state-of-the-art methods on UAV-Human(Li et al.,
2021), 7.3% improvement on Drone-Action(Perera et al., 2019), and 7.16%
improvement on NEC Drones(Choi et al., 2020).
Related papers
- PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
Recognition [52.78234467516168]
We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames.
We present an adaptive frame selection strategy using shifted leaky ReLu and cumulative distribution function.
Our method achieves a relative improvement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone, and 9.0% on Diving48 datasets.
arXiv Detail & Related papers (2023-04-14T00:01:11Z) - AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Fourier Disentangled Space-Time Attention for Aerial Video Recognition [54.80846279175762]
We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition.
Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent from the background.
We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone.
arXiv Detail & Related papers (2022-03-21T01:24:53Z) - UAV-Human: A Large Benchmark for Human Behavior Understanding with
Unmanned Aerial Vehicles [12.210724541266183]
We propose a new benchmark - UAVHuman - for human behavior understanding with UAVs.
Our dataset contains 67,428 multi-modal video sequences and 119 subjects for action recognition.
We propose a fisheye-based action recognition method that mitigates the distortions in fisheye videos via learning transformations guided by flat RGB videos.
arXiv Detail & Related papers (2021-04-02T08:54:04Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.