Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking
- URL: http://arxiv.org/abs/2407.14086v2
- Date: Tue, 6 Aug 2024 09:56:36 GMT
- Title: Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking
- Authors: Yunfei Zhang, Chao Liang, Jin Gao, Zhipeng Zhang, Weiming Hu, Stephen Maybank, Xue Zhou, Liang Li,
- Abstract summary: Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
- Score: 52.04679257903805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks by incorporating the extraction of appearance features as auxiliary tasks through embedding Re-Identification task (ReID) into the detector, achieving a balance between inference speed and tracking performance. However, solving the competition between the detector and the feature extractor has always been a challenge. Meanwhile, the issue of directly embedding the ReID task into MOT has remained unresolved. The lack of high discriminability in appearance features results in their limited utility. In this paper, a new learning approach using cross-correlation to capture temporal information of objects is proposed. The feature extraction network is no longer trained solely on appearance features from each frame but learns richer motion features by utilizing feature heatmaps from consecutive frames, which addresses the challenge of inter-class feature similarity. Furthermore, our learning approach is applied to a more lightweight feature extraction network, and treat the feature matching scores as strong cues rather than auxiliary cues, with an appropriate weight calculation to reflect the compatibility between our obtained features and the MOT task. Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks, i.e., MOT17, MOT20, and DanceTrack datasets. Specifically, on the DanceTrack test set, we achieve 56.8 HOTA, 58.1 IDF1 and 92.5 MOTA, making it the best online tracker capable of achieving real-time performance. Comparative evaluations with other trackers prove that our tracker achieves the best balance between speed, robustness and accuracy. Code is available at https://github.com/yfzhang1214/TCBTrack.
Related papers
- FeatureSORT: Essential Features for Effective Tracking [0.0]
We introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective.
By integrating distinct appearance features, including clothing color, style, and target direction, our tracker significantly enhances online tracking accuracy.
arXiv Detail & Related papers (2024-07-05T04:37:39Z) - Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking [51.16677396148247]
Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames.
In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues.
Our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack.
arXiv Detail & Related papers (2023-08-01T18:53:24Z) - SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object
Tracking [20.286114226299237]
This paper introduces SMILEtrack, an innovative object tracker with a Siamese network-based Similarity Learning Module (SLM)
The SLM calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding models.
Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames.
arXiv Detail & Related papers (2022-11-16T10:49:48Z) - On the detection-to-track association for online multi-object tracking [30.883165972525347]
We propose a hybrid track association algorithm that models the historical appearance distances of a track with an incremental Gaussian mixture model (IGMM)
Experimental results on three MOT benchmarks confirm that HTA effectively improves the target identification performance with a small compromise to the tracking speed.
arXiv Detail & Related papers (2021-07-01T14:44:12Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Probabilistic Tracklet Scoring and Inpainting for Multiple Object
Tracking [83.75789829291475]
We introduce a probabilistic autoregressive motion model to score tracklet proposals.
This is achieved by training our model to learn the underlying distribution of natural tracklets.
Our experiments demonstrate the superiority of our approach at tracking objects in challenging sequences.
arXiv Detail & Related papers (2020-12-03T23:59:27Z) - Tracklets Predicting Based Adaptive Graph Tracking [51.352829280902114]
We present an accurate and end-to-end learning framework for multi-object tracking, namely textbfTPAGT.
It re-extracts the features of the tracklets in the current frame based on motion predicting, which is the key to solve the problem of features inconsistent.
arXiv Detail & Related papers (2020-10-18T16:16:49Z) - ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking [80.02322563402758]
One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets.
We introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion.
arXiv Detail & Related papers (2020-04-16T06:43:11Z) - Rethinking Convolutional Features in Correlation Filter Based Tracking [0.0]
We revisit a hierarchical deep feature-based visual tracker and find that both the performance and efficiency of the deep tracker are limited by the poor feature quality.
After removing redundant features, our proposed tracker achieves significant improvements in both performance and efficiency.
arXiv Detail & Related papers (2019-12-30T04:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.