Related papers: CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios

CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios

URL: http://arxiv.org/abs/2507.02479v1
Date: Thu, 03 Jul 2025 09:36:44 GMT
Title: CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios
Authors: Teng Fu, Yuwen Chen, Zhuofan Chen, Mengyang Zhao, Bin Li, Xiangyang Xue,
Abstract summary: We propose a large-scale dataset for multi-pedestrian tracking, shot mainly from the first-person view.<n>Our dataset consists of 33 videos, containing a total of 5,185 trajectories.<n>The dataset will provide a platform to facilitate the development of algorithms that remain effective in complex situations.
Score: 41.96441237870711
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-object tracking is a classic field in computer vision. Among them, pedestrian tracking has extremely high application value and has become the most popular research category. Existing methods mainly use motion or appearance information for tracking, which is often difficult in complex scenarios. For the motion information, mutual occlusions between objects often prevent updating of the motion state; for the appearance information, non-robust results are often obtained due to reasons such as only partial visibility of the object or blurred images. Although learning how to perform tracking in these situations from the annotated data is the simplest solution, the existing MOT dataset fails to satisfy this solution. Existing methods mainly have two drawbacks: relatively simple scene composition and non-realistic scenarios. Although some of the video sequences in existing dataset do not have the above-mentioned drawbacks, the number is far from adequate for research purposes. To this end, we propose a difficult large-scale dataset for multi-pedestrian tracking, shot mainly from the first-person view and all from real-life complex scenarios. We name it ``CrowdTrack'' because there are numerous objects in most of the sequences. Our dataset consists of 33 videos, containing a total of 5,185 trajectories. Each object is annotated with a complete bounding box and a unique object ID. The dataset will provide a platform to facilitate the development of algorithms that remain effective in complex situations. We analyzed the dataset comprehensively and tested multiple SOTA models on our dataset. Besides, we analyzed the performance of the foundation models on our dataset. The dataset and project code is released at: https://github.com/loseevaya/CrowdTrack .

Related papers

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z)
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion [56.1428110894411]
We propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation. As the dataset contains mostly group dancing videos, we name it "DanceTrack" We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks.
arXiv Detail & Related papers (2021-11-29T16:49:06Z)
Addressing Multiple Salient Object Detection via Dual-Space Long-Range Dependencies [3.8824028205733017]
Salient object detection plays an important role in many downstream tasks. We propose a network architecture incorporating non-local feature information in both the spatial and channel spaces. We show that our approach accurately locates multiple salient regions even in complex scenarios.
arXiv Detail & Related papers (2021-11-04T23:16:53Z)
Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking. Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z)
Discriminative Appearance Modeling with Multi-track Pooling for Real-time Multi-object Tracking [20.66906781151]
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene. Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory. We propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online.
arXiv Detail & Related papers (2021-01-28T18:12:39Z)
SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars. We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z)
TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average. We ask annotators to label objects that move at any point in the video, and give names to them post factum. Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.