Video Individual Counting for Moving Drones
- URL: http://arxiv.org/abs/2503.10701v1
- Date: Wed, 12 Mar 2025 07:09:33 GMT
- Title: Video Individual Counting for Moving Drones
- Authors: Yaowu Fan, Jia Wan, Tao Han, Antoni B. Chan, Andy J. Ma,
- Abstract summary: Video Individual Counting (VIC) has received increasing attentions recently due to its importance in intelligent video surveillance.<n>Previous crowd counting datasets are captured with fixed or rarely moving cameras with relatively sparse individuals.<n>We propose a density map based VIC method based on a MovingDroneCrowd dataset.
- Score: 51.429771128144964
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video Individual Counting (VIC) has received increasing attentions recently due to its importance in intelligent video surveillance. Existing works are limited in two aspects, i.e., dataset and method. Previous crowd counting datasets are captured with fixed or rarely moving cameras with relatively sparse individuals, restricting evaluation for a highly varying view and time in crowded scenes. While VIC methods have been proposed based on localization-then-association or localization-then-classification, they may not perform well due to difficulty in accurate localization of crowded and small targets under challenging scenarios. To address these issues, we collect a MovingDroneCrowd Dataset and propose a density map based VIC method. Different from existing datasets, our dataset consists of videos captured by fast-moving drones in crowded scenes under diverse illuminations, shooting heights and angles. Other than localizing individuals, we propose a Depth-wise Cross-Frame Attention (DCFA) module, which directly estimate inflow and outflow density maps through learning shared density maps between consecutive frames. The inflow density maps across frames are summed up to obtain the number of unique pedestrians in a video. Experiments on our datasets and publicly available ones show the superiority of our method over the state of the arts for VIC in highly dynamic and complex crowded scenes. Our dataset and codes will be released publicly.
Related papers
- Panonut360: A Head and Eye Tracking Dataset for Panoramic Video [0.0]
We present a head and eye tracking dataset involving 50 users watching 15 panoramic videos.
The dataset provides details on the viewport and gaze attention locations of users.
Our analysis reveals a consistent downward offset in gaze fixations relative to the Field of View.
arXiv Detail & Related papers (2024-03-26T13:54:52Z) - Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV &
CribsTV [50.616892315086574]
This paper proposes two novel datasets: SlowTV and CribsTV.
These are large-scale datasets curated from publicly available YouTube videos, containing a total of 2M training frames.
We leverage these datasets to tackle the challenging task of zero-shot generalization.
arXiv Detail & Related papers (2024-03-03T17:29:03Z) - STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded
Scenes [78.95447086305381]
Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales.
Existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution.
We introduce a large-scale multimodal dataset, STCrowd, to better evaluate pedestrian perception algorithms in crowded scenarios.
arXiv Detail & Related papers (2022-04-03T08:26:07Z) - CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization [44.97567243883994]
We propose a new benchmark for visual localization in outdoor scenes using crowd-sourced data.
We show that our dataset is very challenging, with all evaluated methods failing on its hardest parts.
As part of the dataset release, we provide the tooling used to generate it, enabling efficient and effective 2D correspondence annotation.
arXiv Detail & Related papers (2021-09-09T19:25:48Z) - TIMo -- A Dataset for Indoor Building Monitoring with a Time-of-Flight
Camera [9.746370805708095]
We present TIMo, a dataset for video-based monitoring of indoor spaces captured using a time-of-flight (ToF) camera.
The resulting depth videos feature people performing a set of different predefined actions.
Person detection for people counting and anomaly detection are the two targeted applications.
arXiv Detail & Related papers (2021-08-27T09:33:11Z) - Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark [97.07865343576361]
We construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd.
We annotate 20,800 people trajectories with 4.8 million heads and several video-level attributes.
We design the Space-Time Neighbor-Aware Network (STNNet) as a strong baseline to solve object detection, tracking and counting jointly in dense crowds.
arXiv Detail & Related papers (2021-05-06T04:46:14Z) - Motion-guided Non-local Spatial-Temporal Network for Video Crowd
Counting [2.3732259124656903]
We study video crowd counting, which is to estimate the number of objects in all the frames of a video sequence.
We propose Monet, a motion-guided non-local spatial-temporal network for video crowd counting.
Our approach achieves substantially better performance in terms of MAE and MSE as compared with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-28T18:05:13Z) - Multi-Scale Context Aggregation Network with Attention-Guided for Crowd
Counting [23.336181341124746]
Crowd counting aims to predict the number of people and generate the density map in the image.
There are many challenges, including varying head scales, the diversity of crowd distribution across images and cluttered backgrounds.
We propose a multi-scale context aggregation network (MSCANet) based on single-column encoder-decoder architecture for crowd counting.
arXiv Detail & Related papers (2021-04-06T02:24:06Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet.
We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane.
Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.