Related papers: Video Individual Counting and Tracking from Moving Drones: A Benchmark and Methods

Video Individual Counting and Tracking from Moving Drones: A Benchmark and Methods

URL: http://arxiv.org/abs/2601.12500v1
Date: Sun, 18 Jan 2026 17:17:31 GMT
Title: Video Individual Counting and Tracking from Moving Drones: A Benchmark and Methods
Authors: Yaowu Fan, Jia Wan, Tao Han, Andy J. Ma, Antoni B. Chan,
Abstract summary: We introduce MovingDroneCrowd++, the largest video-level dataset for dense crowd counting and tracking captured by moving drones.<n>We also propose GD3A, a density map-based video individual counting method that avoids explicit localization.<n> Experimental results show that our methods significantly outperform existing approaches under dense crowds and complex motion.
Score: 51.91154554622608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Counting and tracking dense crowds in large-scale scenes is highly challenging, yet existing methods mainly rely on datasets captured by fixed cameras, which provide limited spatial coverage and are inadequate for large-scale dense crowd analysis. To address this limitation, we propose a flexible solution using moving drones to capture videos and perform video-level crowd counting and tracking of unique pedestrians across entire scenes. We introduce MovingDroneCrowd++, the largest video-level dataset for dense crowd counting and tracking captured by moving drones, covering diverse and complex conditions with varying flight altitudes, camera angles, and illumination. Existing methods fail to achieve satisfactory performance on this dataset. To this end, we propose GD3A (Global Density Map Decomposition via Descriptor Association), a density map-based video individual counting method that avoids explicit localization. GD3A establishes pixel-level correspondences between pedestrian descriptors across consecutive frames via optimal transport with an adaptive dustbin score, enabling the decomposition of global density maps into shared, inflow, and outflow components. Building on this framework, we further introduce DVTrack, which converts descriptor-level matching into instance-level associations through a descriptor voting mechanism for pedestrian tracking. Experimental results show that our methods significantly outperform existing approaches under dense crowds and complex motion, reducing counting error by 47.4 percent and improving tracking performance by 39.2 percent.

Related papers

Video Individual Counting for Moving Drones [51.429771128144964]
Video Individual Counting (VIC) has received increasing attention for its importance in intelligent video surveillance.<n>Previous datasets are captured with fixed or rarely moving cameras with relatively sparse individuals, restricting evaluation for a highly varying view and time in crowded scenes.<n>To address these issues, we introduce the MovingDroneCrowd dataset, featuring videos captured by fast-moving drones in crowded scenes under diverse illuminations, shooting heights and angles.
arXiv Detail & Related papers (2025-03-12T07:09:33Z)
DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild [85.03973683867797]
This paper proposes a concise, elegant, and robust pipeline to estimate smooth camera trajectories and obtain dense point clouds for casual videos in the wild. We show that the proposed method achieves state-of-the-art performance in terms of camera pose estimation even in complex dynamic challenge scenes.
arXiv Detail & Related papers (2024-11-20T13:01:16Z)
DenseTrack: Drone-based Crowd Tracking via Density-aware Motion-appearance Synergy [33.57923199717605]
Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective. To address these challenges, we present the Density-aware Tracking (DenseTrack) framework. DenseTrack capitalizes on crowd counting to precisely determine object locations, blending visual and motion cues to improve the tracking of small-scale objects.
arXiv Detail & Related papers (2024-07-24T13:39:07Z)
The Interstate-24 3D Dataset: a new benchmark for 3D multi-camera vehicle tracking [4.799822253865053]
This work presents a novel video dataset recorded from overlapping highway traffic cameras along an urban interstate, enabling multi-camera 3D object tracking in a traffic monitoring context. Data is released from 3 scenes containing video from at least 16 cameras each, totaling 57 minutes in length. 877,000 3D bounding boxes and corresponding object tracklets are fully and accurately annotated for each camera field of view and are combined into a spatially and temporally continuous set of vehicle trajectories for each scene.
arXiv Detail & Related papers (2023-08-28T18:43:33Z)
SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth [84.64121608109087]
We propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
arXiv Detail & Related papers (2023-06-08T14:36:10Z)
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes [78.95447086305381]
Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales. Existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution. We introduce a large-scale multimodal dataset, STCrowd, to better evaluate pedestrian perception algorithms in crowded scenarios.
arXiv Detail & Related papers (2022-04-03T08:26:07Z)
Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm. We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.