Z-GMOT: Zero-shot Generic Multiple Object Tracking
- URL: http://arxiv.org/abs/2305.17648v4
- Date: Thu, 13 Jun 2024 14:58:23 GMT
- Title: Z-GMOT: Zero-shot Generic Multiple Object Tracking
- Authors: Kim Hoang Tran, Anh Duy Le Dinh, Tien Phat Nguyen, Thinh Phan, Pha Nguyen, Khoa Luu, Donald Adjeroh, Gianfranco Doretto, Ngan Hoang Le,
- Abstract summary: Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories.
To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach.
We propose $mathttZ-GMOT$, a cutting-edge tracking solution capable of tracking objects from textitnever-seen categories without the need of initial bounding boxes or predefined categories.
- Score: 8.878331472995498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle to handle variations in factors such as viewpoint, lighting, occlusion, and scale, among others. Our contributions commence with the introduction of the \textit{Referring GMOT dataset} a collection of videos, each accompanied by detailed textual descriptions of their attributes. Subsequently, we propose $\mathtt{Z-GMOT}$, a cutting-edge tracking solution capable of tracking objects from \textit{never-seen categories} without the need of initial bounding boxes or predefined categories. Within our $\mathtt{Z-GMOT}$ framework, we introduce two novel components: (i) $\mathtt{iGLIP}$, an improved Grounded language-image pretraining, for accurately detecting unseen objects with specific characteristics. (ii) $\mathtt{MA-SORT}$, a novel object association approach that adeptly integrates motion and appearance-based matching strategies to tackle the complex task of tracking objects with high similarity. Our contributions are benchmarked through extensive experiments conducted on the Referring GMOT dataset for GMOT task. Additionally, to assess the generalizability of the proposed $\mathtt{Z-GMOT}$, we conduct ablation studies on the DanceTrack and MOT20 datasets for the MOT task. Our dataset, code, and models are released at: https://fsoft-aic.github.io/Z-GMOT.
Related papers
- ClickTrack: Towards Real-time Interactive Single Object Tracking [58.52366657445601]
We propose a new paradigm for single object tracking algorithms, ClickTrack, a new paradigm using clicking interaction for real-time scenarios.
To address ambiguity in certain special scenarios, we designed the Guided Click Refiner(GCR), which accepts point and optional textual information as inputs.
Experiments on LaSOT and GOT-10k benchmarks show that tracker combined with GCR achieves stable performance in real-time interactive scenarios.
arXiv Detail & Related papers (2024-11-20T10:30:33Z) - Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking [0.08333024746293495]
Grounded-GMOT is an innovative tracking paradigm that enables users to track multiple generic objects in videos through natural language descriptors.
Our contributions begin with the introduction of the G2MOT dataset, which includes a collection of videos featuring a wide variety of generic objects.
Following this, we propose a novel tracking method, KAM-SORT, which not only effectively integrates visual appearance with motion cues but also enhances the Kalman filter.
arXiv Detail & Related papers (2024-10-11T20:38:17Z) - TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT [0.0]
Multi-Object Tracking (MOT) has made substantial advancements, but it is limited by heavy reliance on prior knowledge.
Generic Multiple Object Tracking (GMOT), tracking multiple objects with similar appearance, requires less prior information about the targets.
We introduce a novel text prompt-based open-vocabulary GMOT framework, called textbftextTP-GMOT.
Our contributions are benchmarked on the textRefer-GMOT dataset for GMOT task.
arXiv Detail & Related papers (2024-09-04T07:33:09Z) - Siamese-DETR for Generic Multi-Object Tracking [16.853363984562602]
Traditional Multi-Object Tracking (MOT) is limited to tracking objects belonging to the pre-defined closed-set categories.
Siamese-DETR is proposed to track objects beyond pre-defined categories with the given text prompt and template image.
Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.
arXiv Detail & Related papers (2023-10-27T03:32:05Z) - UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline.
Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - Unifying Tracking and Image-Video Object Detection [54.91658924277527]
TrIVD (Tracking and Image-Video Detection) is the first framework that unifies image OD, video OD, and MOT within one end-to-end model.
To handle the discrepancies and semantic overlaps of category labels, TrIVD formulates detection/tracking as grounding and reasons about object categories.
arXiv Detail & Related papers (2022-11-20T20:30:28Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Multi-Object Tracking and Segmentation via Neural Message Passing [0.0]
Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and Multiple Object Tracking and (MOTS)
We exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs)
We achieve state-of-the-art results for both tracking and segmentation in several publicly available datasets.
arXiv Detail & Related papers (2022-07-15T13:03:47Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Chained-Tracker: Chaining Paired Attentive Regression Results for
End-to-End Joint Multiple-Object Detection and Tracking [102.31092931373232]
We propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution.
The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective.
arXiv Detail & Related papers (2020-07-29T02:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.