Fourier Disentangled Space-Time Attention for Aerial Video Recognition
- URL: http://arxiv.org/abs/2203.10694v1
- Date: Mon, 21 Mar 2022 01:24:53 GMT
- Title: Fourier Disentangled Space-Time Attention for Aerial Video Recognition
- Authors: Divya Kothandaraman, Tianrui Guan, Xijun Wang, Sean Hu, Ming Lin,
Dinesh Manocha
- Abstract summary: We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition.
Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent from the background.
We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone.
- Score: 54.80846279175762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an algorithm, Fourier Activity Recognition (FAR), for UAV video
activity recognition. Our formulation uses a novel Fourier object
disentanglement method to innately separate out the human agent (which is
typically small) from the background. Our disentanglement technique operates in
the frequency domain to characterize the extent of temporal change of spatial
pixels, and exploits convolution-multiplication properties of Fourier transform
to map this representation to the corresponding object-background entangled
features obtained from the network. To encapsulate contextual information and
long-range space-time dependencies, we present a novel Fourier Attention
algorithm, which emulates the benefits of self-attention by modeling the
weighted outer product in the frequency domain. Our Fourier attention
formulation uses much fewer computations than self-attention. We have evaluated
our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night,
Drone Action, and NEC Drone. We demonstrate a relative improvement of 8.02% -
38.69% in top-1 accuracy and up to 3 times faster over prior works.
Related papers
- Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - Neural Fourier Filter Bank [18.52741992605852]
We present a novel method to provide efficient and highly detailed reconstructions.
Inspired by wavelets, we learn a neural field that decompose the signal both spatially and frequency-wise.
arXiv Detail & Related papers (2022-12-04T03:45:08Z) - QFF: Quantized Fourier Features for Neural Field Representations [28.82293263445964]
We show that using Quantized Fourier Features (QFF) can result in smaller model size, faster training, and better quality outputs for several applications.
QFF are easy to code, fast to compute, and serve as a simple drop-in addition to many neural field representations.
arXiv Detail & Related papers (2022-12-02T00:11:22Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property.
We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Seeing Implicit Neural Representations as Fourier Series [13.216389226310987]
Implicit Neural Representations (INR) use multilayer perceptrons to represent high-frequency functions in low-dimensional problem domains.
These representations achieved state-of-the-art results on tasks related to complex 3D objects and scenes.
This work analyzes the connection between the two methods and shows that a Fourier mapped perceptron is structurally like one hidden layer SIREN.
arXiv Detail & Related papers (2021-09-01T08:40:20Z) - Fourier Features Let Networks Learn High Frequency Functions in Low
Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions.
Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.