SUTrack: Towards Simple and Unified Single Object Tracking
- URL: http://arxiv.org/abs/2412.19138v1
- Date: Thu, 26 Dec 2024 09:41:36 GMT
- Title: SUTrack: Towards Simple and Unified Single Object Tracking
- Authors: Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu,
- Abstract summary: We propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack.
It consolidates five SOT tasks into a unified model trained in a single session.
Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks.
- Score: 48.788978093502294
- License:
- Abstract: In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session. Due to the distinct nature of the data, current methods typically design individual architectures and train separate models for each task. This fragmentation results in redundant training processes, repetitive technological innovations, and limited cross-modal knowledge sharing. In contrast, SUTrack demonstrates that a single model with a unified input representation can effectively handle various common SOT tasks, eliminating the need for task-specific designs and separate training sessions. Additionally, we introduce a task-recognition auxiliary training strategy and a soft token type embedding to further enhance SUTrack's performance with minimal overhead. Experiments show that SUTrack outperforms previous task-specific counterparts across 11 datasets spanning five SOT tasks. Moreover, we provide a range of models catering edge devices as well as high-performance GPUs, striking a good trade-off between speed and accuracy. We hope SUTrack could serve as a strong foundation for further compelling research into unified tracking models. Code and models are available at github.com/chenxin-dlut/SUTrack.
Related papers
- SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking [34.90147791481045]
SynCL is a novel plug-and-play synergistic training strategy designed to co-facilitate multi-task learning for detection and tracking.
We show that SynCL consistently delivers improvements when integrated with the training stage of various query-based 3D MOT trackers.
Without additional inference costs, SynCL improves the state-of-the-art PF-Track method by $+3.9%$ AMOTA and $+2.0%$ NDS on the nuScenes dataset.
arXiv Detail & Related papers (2024-11-11T08:18:49Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking [19.50096632818305]
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
Recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data.
We propose a novel symmetric multimodal tracking framework called SDSTrack.
arXiv Detail & Related papers (2024-03-24T04:15:50Z) - Single-Model and Any-Modality for Video Object Tracking [85.83753760853142]
We introduce Un-Track, a Unified Tracker of a single set of parameters for any modality.
To handle any modality, our method learns their common latent space through low-rank factorization and reconstruction techniques.
Our Un-Track achieves +8.1 absolute F-score gain, on the DepthTrack dataset, by introducing only +2.14 (over 21.50) GFLOPs with +6.6M (over 93M) parameters.
arXiv Detail & Related papers (2023-11-27T14:17:41Z) - Visual Prompt Multi-Modal Tracking [71.53972967568251]
Visual Prompt multi-modal Tracking (ViPT) learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multimodal tracking tasks.
ViPT outperforms the full fine-tuning paradigm on multiple downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event tracking.
arXiv Detail & Related papers (2023-03-20T01:51:07Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Do Different Tracking Tasks Require Different Appearance Models? [118.02175542476367]
We present UniTrack, a unified tracking solution to address five different tasks within the same framework.
UniTrack consists of a single and task-agnostic appearance model, which can be learned in a supervised or self-supervised fashion.
We show how most tracking tasks can be solved within this framework, and that the same appearance model can be used to obtain performance that is competitive against specialised methods for all the five tasks considered.
arXiv Detail & Related papers (2021-07-05T17:40:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.