YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID
- URL: http://arxiv.org/abs/2501.13710v1
- Date: Thu, 23 Jan 2025 14:38:40 GMT
- Title: YOLO11-JDE: Fast and Accurate Multi-Object Tracking with Self-Supervised Re-ID
- Authors: IƱaki Erregue, Kamal Nasrollahi, Sergio Escalera,
- Abstract summary: We introduce YOLO11-JDE, a fast and accurate multi-object tracking (MOT) solution that combines real-time object detection with self-supervised Re-Identification (Re-ID)
By incorporating a dedicated Re-ID branch into YOLO11s, our model performs Joint Detection and Embedding (JDE) generating appearance features for each detection.
YOLO11-JDE achieves competitive results on MOT17 and MOT20 benchmarks, surpassing existing JDE methods in terms of FPS and using up to ten times fewer parameters.
- Score: 38.27486095404261
- License:
- Abstract: We introduce YOLO11-JDE, a fast and accurate multi-object tracking (MOT) solution that combines real-time object detection with self-supervised Re-Identification (Re-ID). By incorporating a dedicated Re-ID branch into YOLO11s, our model performs Joint Detection and Embedding (JDE), generating appearance features for each detection. The Re-ID branch is trained in a fully self-supervised setting while simultaneously training for detection, eliminating the need for costly identity-labeled datasets. The triplet loss, with hard positive and semi-hard negative mining strategies, is used for learning discriminative embeddings. Data association is enhanced with a custom tracking implementation that successfully integrates motion, appearance, and location cues. YOLO11-JDE achieves competitive results on MOT17 and MOT20 benchmarks, surpassing existing JDE methods in terms of FPS and using up to ten times fewer parameters. Thus, making our method a highly attractive solution for real-world applications.
Related papers
- Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - YOLO-World: Real-Time Open-Vocabulary Object Detection [87.08732047660058]
We introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities.
Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency.
YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-01-30T18:59:38Z) - Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking [51.16677396148247]
Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames.
In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues.
Our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack.
arXiv Detail & Related papers (2023-08-01T18:53:24Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object
Tracking [10.969806056391004]
Joint detection and embedding (JDE) based methods estimate bounding boxes and embedding features of objects with a single network in Multi-Object Tracking (MOT)
In the tracking stage, JDE-based methods fuse the target motion information and appearance information by applying the same rule.
We propose a new association matrix, the Embedding and Giou matrix, which combines embedding cosine distance and Giou distance of objects.
arXiv Detail & Related papers (2022-03-08T10:19:35Z) - Multi-object Tracking with a Hierarchical Single-branch Network [31.680667324595557]
We propose an online multi-object tracking framework based on a hierarchical single-branch network.
Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance.
Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance.
arXiv Detail & Related papers (2021-01-06T12:14:58Z) - Multi-object tracking with self-supervised associating network [5.947279761429668]
We propose a novel self-supervised learning method using a lot of short videos which has no human labeling.
Despite the re-identification network is trained in a self-supervised manner, it achieves the state-of-the-art performance of MOTA 62.0% and IDF1 62.6% on the MOT17 test benchmark.
arXiv Detail & Related papers (2020-10-26T08:48:23Z) - Rethinking the competition between detection and ReID in Multi-Object
Tracking [44.59367033562385]
One-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT)
In this paper, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design to better learn task-dependent representations.
We also introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings.
arXiv Detail & Related papers (2020-10-23T02:44:59Z) - Towards Precise Intra-camera Supervised Person Re-identification [54.86892428155225]
Intra-camera supervision (ICS) for person re-identification (Re-ID) assumes that identity labels are independently annotated within each camera view.
Lack of inter-camera labels makes the ICS Re-ID problem much more challenging than the fully supervised counterpart.
Our approach performs even comparable to state-of-the-art fully supervised methods in two of the datasets.
arXiv Detail & Related papers (2020-02-12T11:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.