Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
- URL: http://arxiv.org/abs/2410.09243v1
- Date: Fri, 11 Oct 2024 20:38:17 GMT
- Title: Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
- Authors: Duy Le Dinh Anh, Kim Hoang Tran, Quang-Thuc Nguyen, Ngan Hoang Le,
- Abstract summary: Grounded-GMOT is an innovative tracking paradigm that enables users to track multiple generic objects in videos through natural language descriptors.
Our contributions begin with the introduction of the G2MOT dataset, which includes a collection of videos featuring a wide variety of generic objects.
Following this, we propose a novel tracking method, KAM-SORT, which not only effectively integrates visual appearance with motion cues but also enhances the Kalman filter.
- Score: 0.08333024746293495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent progress, Multi-Object Tracking (MOT) continues to face significant challenges, particularly its dependence on prior knowledge and predefined categories, complicating the tracking of unfamiliar objects. Generic Multiple Object Tracking (GMOT) emerges as a promising solution, requiring less prior information. Nevertheless, existing GMOT methods, primarily designed as OneShot-GMOT, rely heavily on initial bounding boxes and often struggle with variations in viewpoint, lighting, occlusion, and scale. To overcome the limitations inherent in both MOT and GMOT when it comes to tracking objects with specific generic attributes, we introduce Grounded-GMOT, an innovative tracking paradigm that enables users to track multiple generic objects in videos through natural language descriptors. Our contributions begin with the introduction of the G2MOT dataset, which includes a collection of videos featuring a wide variety of generic objects, each accompanied by detailed textual descriptions of their attributes. Following this, we propose a novel tracking method, KAM-SORT, which not only effectively integrates visual appearance with motion cues but also enhances the Kalman filter. KAM-SORT proves particularly advantageous when dealing with objects of high visual similarity from the same generic category in GMOT scenarios. Through comprehensive experiments, we demonstrate that Grounded-GMOT outperforms existing OneShot-GMOT approaches. Additionally, our extensive comparisons between various trackers highlight KAM-SORT's efficacy in GMOT, further establishing its significance in the field. Project page: https://UARK-AICV.github.io/G2MOT. The source code and dataset will be made publicly available.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT [0.0]
Multi-Object Tracking (MOT) has made substantial advancements, but it is limited by heavy reliance on prior knowledge.
Generic Multiple Object Tracking (GMOT), tracking multiple objects with similar appearance, requires less prior information about the targets.
We introduce a novel text prompt-based open-vocabulary GMOT framework, called textbftextTP-GMOT.
Our contributions are benchmarked on the textRefer-GMOT dataset for GMOT task.
arXiv Detail & Related papers (2024-09-04T07:33:09Z) - Siamese-DETR for Generic Multi-Object Tracking [16.853363984562602]
Traditional Multi-Object Tracking (MOT) is limited to tracking objects belonging to the pre-defined closed-set categories.
Siamese-DETR is proposed to track objects beyond pre-defined categories with the given text prompt and template image.
Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.
arXiv Detail & Related papers (2023-10-27T03:32:05Z) - Z-GMOT: Zero-shot Generic Multiple Object Tracking [8.878331472995498]
Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories.
To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach.
We propose $mathttZ-GMOT$, a cutting-edge tracking solution capable of tracking objects from textitnever-seen categories without the need of initial bounding boxes or predefined categories.
arXiv Detail & Related papers (2023-05-28T06:44:33Z) - OVTrack: Open-Vocabulary Multiple Object Tracking [64.73379741435255]
OVTrack is an open-vocabulary tracker capable of tracking arbitrary object classes.
It sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark.
arXiv Detail & Related papers (2023-04-17T16:20:05Z) - OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline.
Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - Beyond SOT: Tracking Multiple Generic Objects at Once [141.36900362724975]
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video.
We introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence.
Our approach achieves highly competitive results on single-object GOT datasets, setting a new state of the art on TrackingNet with a success rate AUC of 84.4%.
arXiv Detail & Related papers (2022-12-22T17:59:19Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.