Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the
Heads
- URL: http://arxiv.org/abs/2304.07705v3
- Date: Mon, 30 Oct 2023 23:41:07 GMT
- Title: Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the
Heads
- Authors: Yu Zhang, Huaming Chen, Wei Bao, Zhongzheng Lai, Zao Zhang, Dong Yuan
- Abstract summary: In this work, we have designed a joint head and body detector in an anchor-free style to boost the detection recall and precision performance of pedestrians.
Our model does not require information on the statistical head-body ratio for common pedestrians detection for training.
We evaluate the model with extensive experiments on different datasets, including MOT20, Crowdhuman, and HT21 datasets.
- Score: 29.80438304958294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid development of deep learning, object detection and tracking
play a vital role in today's society. Being able to identify and track all the
pedestrians in the dense crowd scene with computer vision approaches is a
typical challenge in this field, also known as the Multiple Object Tracking
(MOT) challenge. Modern trackers are required to operate on more and more
complicated scenes. According to the MOT20 challenge result, the pedestrian is
4 times denser than the MOT17 challenge. Hence, improving the ability to detect
and track in extremely crowded scenes is the aim of this work. In light of the
occlusion issue with the human body, the heads are usually easier to identify.
In this work, we have designed a joint head and body detector in an anchor-free
style to boost the detection recall and precision performance of pedestrians in
both small and medium sizes. Innovatively, our model does not require
information on the statistical head-body ratio for common pedestrians detection
for training. Instead, the proposed model learns the ratio dynamically. To
verify the effectiveness of the proposed model, we evaluate the model with
extensive experiments on different datasets, including MOT20, Crowdhuman, and
HT21 datasets. As a result, our proposed method significantly improves both the
recall and precision rate on small & medium sized pedestrians and achieves
state-of-the-art results in these challenging datasets.
Related papers
- MAML MOT: Multiple Object Tracking based on Meta-Learning [7.892321926673001]
MAML MOT is a meta-learning-based training approach for multi-object tracking.
We introduce MAML MOT, a meta-learning-based training approach for multi-object tracking.
arXiv Detail & Related papers (2024-05-12T12:38:40Z) - LoRA-like Calibration for Multimodal Deception Detection using ATSFace
Data [1.550120821358415]
We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics.
We employ a multimodal fusion strategy that enhances accuracy; our approach yields a 92% accuracy rate on a real-life trial dataset.
arXiv Detail & Related papers (2023-09-04T06:22:25Z) - STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded
Scenes [78.95447086305381]
Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales.
Existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution.
We introduce a large-scale multimodal dataset, STCrowd, to better evaluate pedestrian perception algorithms in crowded scenarios.
arXiv Detail & Related papers (2022-04-03T08:26:07Z) - Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual
Observations [75.60524561611008]
This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments.
We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data.
We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
arXiv Detail & Related papers (2021-09-28T20:25:10Z) - Tracking Pedestrian Heads in Dense Crowd [0.0]
We propose to revitalize head tracking with Crowd of Heads dataset (CroHD)
CroHD consists of 9 sequences of 11,463 frames with over 2,276,838 heads and 5,230 tracks annotated in diverse scenes.
We also propose a new head detector, HeadHunter, which is designed for small head detection in crowded scenes.
arXiv Detail & Related papers (2021-03-24T22:51:17Z) - LID 2020: The Learning from Imperfect Data Challenge Results [242.86700551532272]
Learning from Imperfect Data workshop aims to inspire and facilitate the research in developing novel approaches.
We organize three challenges to find the state-of-the-art approaches in weakly supervised learning setting.
This technical report summarizes the highlights from the challenge.
arXiv Detail & Related papers (2020-10-17T13:06:12Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - Tracking in Crowd is Challenging: Analyzing Crowd based on Physical
Characteristics [0.0]
Event detection method is developed to identify abnormal behavior intelligently.
The problem is very challenging due to high crowd density in different areas.
We consider a novel method to deal with these challenges.
arXiv Detail & Related papers (2020-08-08T22:42:25Z) - MOT20: A benchmark for multi object tracking in crowded scenes [73.92443841487503]
We present our MOT20benchmark, consisting of 8 new sequences depicting very crowded challenging scenes.
The benchmark was presented first at the 4thBMTT MOT Challenge Workshop at the Computer Vision and Pattern Recognition Conference (CVPR)
arXiv Detail & Related papers (2020-03-19T20:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.