SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object
Tracking
- URL: http://arxiv.org/abs/2211.08824v4
- Date: Mon, 22 Jan 2024 06:46:27 GMT
- Title: SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object
Tracking
- Authors: Yu-Hsiang Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming-Ching Chang, Hung
Hin So, Xin Li
- Abstract summary: This paper introduces SMILEtrack, an innovative object tracker with a Siamese network-based Similarity Learning Module (SLM)
The SLM calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding models.
Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames.
- Score: 20.286114226299237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent progress in Multiple Object Tracking (MOT), several obstacles
such as occlusions, similar objects, and complex scenes remain an open
challenge. Meanwhile, a systematic study of the cost-performance tradeoff for
the popular tracking-by-detection paradigm is still lacking. This paper
introduces SMILEtrack, an innovative object tracker that effectively addresses
these challenges by integrating an efficient object detector with a Siamese
network-based Similarity Learning Module (SLM). The technical contributions of
SMILETrack are twofold. First, we propose an SLM that calculates the appearance
similarity between two objects, overcoming the limitations of feature
descriptors in Separate Detection and Embedding (SDE) models. The SLM
incorporates a Patch Self-Attention (PSA) block inspired by the vision
Transformer, which generates reliable features for accurate similarity
matching. Second, we develop a Similarity Matching Cascade (SMC) module with a
novel GATE function for robust object matching across consecutive video frames,
further enhancing MOT performance. Together, these innovations help SMILETrack
achieve an improved trade-off between the cost ({\em e.g.}, running speed) and
performance (e.g., tracking accuracy) over several existing state-of-the-art
benchmarks, including the popular BYTETrack method. SMILETrack outperforms
BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets.
Code is available at https://github.com/pingyang1117/SMILEtrack_Official
Related papers
- ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple
Object Tracking [73.52284039530261]
We present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.
We find that the resulting distinctive feature space admits a simple nearest neighbor search at inference time for object association.
We show that our similarity learning scheme is not limited to video data, but can learn effective instance similarity even from static input.
arXiv Detail & Related papers (2022-10-12T15:47:36Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - SMOT: Single-Shot Multi Object Tracking [39.34493475666044]
Single-shot multi-object tracker (SMOT) is a new tracking framework that converts any single-shot detector (SSD) model into an online multiple object tracker.
On three benchmarks of object tracking: Hannah, Music Videos, and MOT17, the proposed SMOT achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-30T02:46:54Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.