Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets
- URL: http://arxiv.org/abs/2007.09509v1
- Date: Sat, 18 Jul 2020 19:51:53 GMT
- Title: Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets
- Authors: Weihong Ren, Xinchao Wang, Jiandong Tian, Yandong Tang and Antoni B.
Chan
- Abstract summary: State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
- Score: 96.98888948518815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art multi-object tracking~(MOT) methods follow the
tracking-by-detection paradigm, where object trajectories are obtained by
associating per-frame outputs of object detectors. In crowded scenes, however,
detectors often fail to obtain accurate detections due to heavy occlusions and
high crowd density. In this paper, we propose a new MOT paradigm,
tracking-by-counting, tailored for crowded scenes. Using crowd density maps, we
jointly model detection, counting, and tracking of multiple targets as a
network flow program, which simultaneously finds the global optimal detections
and trajectories of multiple targets over the whole video. This is in contrast
to prior MOT methods that either ignore the crowd density and thus are prone to
errors in crowded scenes, or rely on a suboptimal two-step process using
heuristic density-aware point-tracks for matching targets.Our approach yields
promising results on public benchmarks of various domains including people
tracking, cell tracking, and fish tracking.
Related papers
- ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - Dense Optical Tracking: Connecting the Dots [82.79642869586587]
DOT is a novel, simple and efficient method for solving the problem of point tracking in a video.
We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal trackers" like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker.
arXiv Detail & Related papers (2023-12-01T18:59:59Z) - Iterative Scale-Up ExpansionIoU and Deep Features Association for
Multi-Object Tracking in Sports [26.33239898091364]
We propose a novel online and robust multi-object tracking approach named deep ExpansionIoU (Deep-EIoU) for sports scenarios.
Unlike conventional methods, we abandon the use of the Kalman filter and leverage the iterative scale-up ExpansionIoU and deep features for robust tracking in sports scenarios.
Our proposed method demonstrates remarkable effectiveness in tracking irregular motion objects, achieving a score of 77.2% on the SportsMOT dataset and 85.4% on the SoccerNet-Tracking dataset.
arXiv Detail & Related papers (2023-06-22T17:47:08Z) - SparseTrack: Multi-Object Tracking by Performing Scene Decomposition
based on Pseudo-Depth [84.64121608109087]
We propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images.
Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets.
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
arXiv Detail & Related papers (2023-06-08T14:36:10Z) - TopTrack: Tracking Objects By Their Top [13.020122353444497]
TopTrack is a joint detection-and-tracking method that uses the top of the object as a keypoint for detection instead of the center.
We performed experiments to show that using the object top as a keypoint for detection can reduce the amount of missed detections.
arXiv Detail & Related papers (2023-04-12T19:00:12Z) - Joint Counting, Detection and Re-Identification for Multi-Object
Tracking [8.89262850257871]
In crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections.
We jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes.
The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 79.7), MOT17 (MOTA of 81.3%) and MOT20 (MOTA of 78.9%)
arXiv Detail & Related papers (2022-12-12T12:53:58Z) - Tracking by Joint Local and Global Search: A Target-aware Attention
based Approach [63.50045332644818]
We propose a novel target-aware attention mechanism (termed TANet) to conduct joint local and global search for robust tracking.
Specifically, we extract the features of target object patch and continuous video frames, then we track and feed them into a decoder network to generate target-aware global attention maps.
In the tracking procedure, we integrate the target-aware attention with multiple trackers by exploring candidate search regions for robust tracking.
arXiv Detail & Related papers (2021-06-09T06:54:15Z) - Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end.
TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.