Prompting for Multi-Modal Tracking
- URL: http://arxiv.org/abs/2207.14571v2
- Date: Mon, 1 Aug 2022 02:50:52 GMT
- Title: Prompting for Multi-Modal Tracking
- Authors: Jinyu Yang and Zhe Li and Feng Zheng and Ale\v{s} Leonardis and
Jingkuan Song
- Abstract summary: We propose a novel multi-modal prompt tracker (ProTrack) for multi-modal tracking.
ProTrack can transfer the multi-modal inputs to a single modality by the prompt paradigm.
Our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data.
- Score: 70.0522146292258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal tracking gains attention due to its ability to be more accurate
and robust in complex scenarios compared to traditional RGB-based tracking. Its
key lies in how to fuse multi-modal data and reduce the gap between modalities.
However, multi-modal tracking still severely suffers from data deficiency, thus
resulting in the insufficient learning of fusion modules. Instead of building
such a fusion module, in this paper, we provide a new perspective on
multi-modal tracking by attaching importance to the multi-modal visual prompts.
We design a novel multi-modal prompt tracker (ProTrack), which can transfer the
multi-modal inputs to a single modality by the prompt paradigm. By best
employing the tracking ability of pre-trained RGB trackers learning at scale,
our ProTrack can achieve high-performance multi-modal tracking by only altering
the inputs, even without any extra training on multi-modal data. Extensive
experiments on 5 benchmark datasets demonstrate the effectiveness of the
proposed ProTrack.
Related papers
- SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking [19.50096632818305]
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
Recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data.
We propose a novel symmetric multimodal tracking framework called SDSTrack.
arXiv Detail & Related papers (2024-03-24T04:15:50Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Single-Model and Any-Modality for Video Object Tracking [85.83753760853142]
We introduce Un-Track, a Unified Tracker of a single set of parameters for any modality.
To handle any modality, our method learns their common latent space through low-rank factorization and reconstruction techniques.
Our Un-Track achieves +8.1 absolute F-score gain, on the DepthTrack dataset, by introducing only +2.14 (over 21.50) GFLOPs with +6.6M (over 93M) parameters.
arXiv Detail & Related papers (2023-11-27T14:17:41Z) - You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector.
It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z) - Visual Prompt Multi-Modal Tracking [71.53972967568251]
Visual Prompt multi-modal Tracking (ViPT) learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multimodal tracking tasks.
ViPT outperforms the full fine-tuning paradigm on multiple downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event tracking.
arXiv Detail & Related papers (2023-03-20T01:51:07Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Multi-modal Visual Tracking: Review and Experimental Comparison [85.20414397784937]
We summarize the multi-modal tracking algorithms, especially visible-depth (RGB-D) tracking and visible-thermal (RGB-T) tracking.
We conduct experiments to analyze the effectiveness of trackers on five datasets.
arXiv Detail & Related papers (2020-12-08T02:39:38Z) - Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted
Multicuts [11.72025865314187]
We present an unsupervised multiple object tracking approach based on minimum visual features and lifted multicuts.
We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
arXiv Detail & Related papers (2020-02-04T09:42:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.