UAV-Human: A Large Benchmark for Human Behavior Understanding with
Unmanned Aerial Vehicles
- URL: http://arxiv.org/abs/2104.00946v1
- Date: Fri, 2 Apr 2021 08:54:04 GMT
- Title: UAV-Human: A Large Benchmark for Human Behavior Understanding with
Unmanned Aerial Vehicles
- Authors: Tianjiao Li and Jun Liu and Wei Zhang and Yun Ni and Wenqian Wang and
Zhiheng Li
- Abstract summary: We propose a new benchmark - UAVHuman - for human behavior understanding with UAVs.
Our dataset contains 67,428 multi-modal video sequences and 119 subjects for action recognition.
We propose a fisheye-based action recognition method that mitigates the distortions in fisheye videos via learning transformations guided by flat RGB videos.
- Score: 12.210724541266183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human behavior understanding with unmanned aerial vehicles (UAVs) is of great
significance for a wide range of applications, which simultaneously brings an
urgent demand of large, challenging, and comprehensive benchmarks for the
development and evaluation of UAV-based models. However, existing benchmarks
have limitations in terms of the amount of captured data, types of data
modalities, categories of provided tasks, and diversities of subjects and
environments. Here we propose a new benchmark - UAVHuman - for human behavior
understanding with UAVs, which contains 67,428 multi-modal video sequences and
119 subjects for action recognition, 22,476 frames for pose estimation, 41,290
frames and 1,144 identities for person re-identification, and 22,263 frames for
attribute recognition. Our dataset was collected by a flying UAV in multiple
urban and rural districts in both daytime and nighttime over three months,
hence covering extensive diversities w.r.t subjects, backgrounds,
illuminations, weathers, occlusions, camera motions, and UAV flying attitudes.
Such a comprehensive and challenging benchmark shall be able to promote the
research of UAV-based human behavior understanding, including action
recognition, pose estimation, re-identification, and attribute recognition.
Furthermore, we propose a fisheye-based action recognition method that
mitigates the distortions in fisheye videos via learning unbounded
transformations guided by flat RGB videos. Experiments show the efficacy of our
method on the UAV-Human dataset.
Related papers
- Real Time Human Detection by Unmanned Aerial Vehicles [0.0]
Two crucial data sources for public security are the thermal infrared (TIR) remote sensing photos and videos produced by unmanned aerial vehicles (UAVs)
due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult.
A UAV TIR object detection framework for pictures and videos is suggested in this study.
arXiv Detail & Related papers (2024-01-06T18:28:01Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - Aerial-Ground Person Re-ID [43.241435887373804]
We propose a new benchmark dataset - AG-ReID, which performs person re-ID matching in a new setting: across aerial and ground cameras.
Our dataset contains 21,983 images of 388 identities and 15 soft attributes for each identity.
The data was collected by a UAV flying at altitudes between 15 to 45 meters and a ground-based CCTV camera on a university campus.
arXiv Detail & Related papers (2023-03-15T13:07:21Z) - MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
for Aerial Video Action Recognition [59.905048445296906]
We present a novel approach for action recognition in UAV videos.
We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain.
In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods.
arXiv Detail & Related papers (2023-03-05T04:05:17Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility [125.77396380698639]
AVisT is a benchmark for visual tracking in diverse scenarios with adverse visibility.
AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios.
We benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes.
arXiv Detail & Related papers (2022-08-14T17:49:37Z) - A Multi-viewpoint Outdoor Dataset for Human Action Recognition [3.522154868524807]
We present a multi-viewpoint outdoor action recognition dataset collected from YouTube and our own drone.
The dataset consists of 20 dynamic human action classes, 2324 video clips and 503086 frames.
The overall baseline action recognition accuracy is 74.0%.
arXiv Detail & Related papers (2021-10-07T14:50:43Z) - UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-identification [21.48667873335246]
Recent development in deep learning allows vision-based counter-UAV systems to detect and track UAVs with a single camera.
The coverage of a single camera is limited, necessitating the need for multicamera configurations to match UAVs across cameras.
We propose the first new UAV re-identification data set, UAV-reID, that facilitates the development of machine learning solutions in this emerging area.
arXiv Detail & Related papers (2021-04-13T14:13:09Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.