A Deep Dive into Generic Object Tracking: A Survey
- URL: http://arxiv.org/abs/2507.23251v1
- Date: Thu, 31 Jul 2025 05:19:26 GMT
- Title: A Deep Dive into Generic Object Tracking: A Survey
- Authors: Fereshteh Aghaee Meibodi, Shadi Alijani, Homayoun Najjaran,
- Abstract summary: Object tracking remains an important yet challenging task in computer vision due to complextemporal dynamics.<n>Siamese-based trackers, discriminative trackers, and transformer-based approaches have been introduced to address these challenges.
- Score: 3.7305040207339286
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generic object tracking remains an important yet challenging task in computer vision due to complex spatio-temporal dynamics, especially in the presence of occlusions, similar distractors, and appearance variations. Over the past two decades, a wide range of tracking paradigms, including Siamese-based trackers, discriminative trackers, and, more recently, prominent transformer-based approaches, have been introduced to address these challenges. While a few existing survey papers in this field have either concentrated on a single category or widely covered multiple ones to capture progress, our paper presents a comprehensive review of all three categories, with particular emphasis on the rapidly evolving transformer-based methods. We analyze the core design principles, innovations, and limitations of each approach through both qualitative and quantitative comparisons. Our study introduces a novel categorization and offers a unified visual and tabular comparison of representative methods. Additionally, we organize existing trackers from multiple perspectives and summarize the major evaluation benchmarks, highlighting the fast-paced advancements in transformer-based tracking driven by their robust spatio-temporal modeling capabilities.
Related papers
- IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking [13.977088329815933]
Multi-Object Tracking (MOT) aims to associate multiple objects across video frames.
Most existing approaches train and track within a single domain, resulting in a lack of cross-domain generalizability.
We develop IP-MOT, an end-to-end transformer model for MOT that operates without concrete textual descriptions.
arXiv Detail & Related papers (2024-10-30T14:24:56Z) - Beyond Traditional Single Object Tracking: A Survey [0.0]
We visit some of the cutting-edge techniques in vision, such as Sequence Models, Generative Models, Self-supervised Learning, Unsupervised Learning, Reinforcement Learning, Meta-Learning, Continual Learning, and Domain Adaptation.
We propose a novel categorization of single object tracking methods based on novel techniques and trends.
We analyze the pros and cons of the presented approaches and present a guide for non-traditional techniques in single object tracking.
arXiv Detail & Related papers (2024-05-16T20:55:31Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Transformers in Single Object Tracking: An Experimental Survey [1.2526963688768458]
Transformer-based tracking approaches have ushered in a new era in single-object tracking.
We conduct an in-depth literature analysis of Transformer tracking approaches by categorizing them into CNN-Transformer based trackers, Two-stream Two-stage fully-Transformer based trackers, and One-stream One-stage fully-Transformer based trackers.
arXiv Detail & Related papers (2023-02-23T09:12:58Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - An Informative Tracking Benchmark [133.0931262969931]
We develop a small and informative tracking benchmark (ITB) with 7% out of 1.2 M frames of existing and newly collected datasets.
We select the most informative sequences from existing benchmarks taking into account 1) challenging level, 2) discriminative strength, 3) and density of appearance variations.
By analyzing the results of 15 state-of-the-art trackers re-trained on the same data, we determine the effective methods for robust tracking under each scenario.
arXiv Detail & Related papers (2021-12-13T07:56:16Z) - Visual Tracking by TridentAlign and Context Embedding [71.60159881028432]
We propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods.
The performance of the proposed tracker is comparable to that of state-of-the-art trackers, while the proposed tracker runs at real-time speed.
arXiv Detail & Related papers (2020-07-14T08:00:26Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - MOPT: Multi-Object Panoptic Tracking [33.77171216778909]
We introduce a novel perception task denoted as multi-object panoptic tracking (MOPT)
MOPT allows for exploiting pixel-level semantic information of 'thing' and'stuff' classes, temporal coherence, and pixel-level associations over time.
We present extensive quantitative and qualitative evaluations of both vision-based and LiDAR-based MOPT that demonstrate encouraging results.
arXiv Detail & Related papers (2020-04-17T11:45:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.