ERA: A Dataset and Deep Learning Benchmark for Event Recognition in
Aerial Videos
- URL: http://arxiv.org/abs/2001.11394v4
- Date: Thu, 25 Jun 2020 10:23:08 GMT
- Title: ERA: A Dataset and Deep Learning Benchmark for Event Recognition in
Aerial Videos
- Authors: Lichao Mou, Yuansheng Hua, Pu Jin, Xiao Xiang Zhu
- Abstract summary: We introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community.
We present a large-scale, human-annotated dataset, named ERA (Event Recognition in Aerial videos)
The ERA dataset is designed to have a significant intra-class variation and inter-class similarity.
- Score: 28.598710179447803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Along with the increasing use of unmanned aerial vehicles (UAVs), large
volumes of aerial videos have been produced. It is unrealistic for humans to
screen such big data and understand their contents. Hence methodological
research on the automatic understanding of UAV videos is of paramount
importance. In this paper, we introduce a novel problem of event recognition in
unconstrained aerial videos in the remote sensing community and present a
large-scale, human-annotated dataset, named ERA (Event Recognition in Aerial
videos), consisting of 2,864 videos each with a label from 25 different classes
corresponding to an event unfolding 5 seconds. The ERA dataset is designed to
have a significant intra-class variation and inter-class similarity and
captures dynamic events in various circumstances and at dramatically various
scales. Moreover, to offer a benchmark for this task, we extensively validate
existing deep networks. We expect that the ERA dataset will facilitate further
progress in automatic aerial video comprehension. The website is
https://lcmou.github.io/ERA_Dataset/
Related papers
- Towards Student Actions in Classroom Scenes: New Dataset and Baseline [43.268586725768465]
We present a new multi-label student action video (SAV) dataset for complex classroom scenes.
The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms.
arXiv Detail & Related papers (2024-09-02T03:44:24Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Anomaly Detection in Aerial Videos with Transformers [49.011385492802674]
We create a new dataset, named DroneAnomaly, for anomaly detection in aerial videos.
There are 87,488 color video frames (51,635 for training and 35,853 for testing) with the size of $640 times 640$ at 30 frames per second.
We present a new baseline model, ANomaly Detection with Transformers (ANDT), which treats consecutive video frames as a sequence of tubelets.
arXiv Detail & Related papers (2022-09-25T21:24:18Z) - Weakly Supervised Two-Stage Training Scheme for Deep Video Fight
Detection Model [0.0]
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media.
Previous work has largely relied on action recognition techniques to tackle this problem.
We design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.
arXiv Detail & Related papers (2022-09-23T08:29:16Z) - NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information.
We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Automatic Curation of Large-Scale Datasets for Audio-Visual
Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation.
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z) - Human in Events: A Large-Scale Benchmark for Human-centric Video
Analysis in Complex Events [106.19047816743988]
We present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve.
It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time.
Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation.
arXiv Detail & Related papers (2020-05-09T18:24:52Z) - AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude
Traffic Surveillance [20.318367304051176]
Unmanned aerial vehicles (UAVs) with mounted cameras have the advantage of capturing aerial (bird-view) images.
Several aerial datasets have been introduced, including visual data with object annotations.
We propose a multi-purpose aerial dataset (AU-AIR) that has multi-modal sensor data collected in real-world outdoor environments.
arXiv Detail & Related papers (2020-01-31T09:45:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.