Correlation-Embedded Transformer Tracking: A Single-Branch Framework
- URL: http://arxiv.org/abs/2401.12743v1
- Date: Tue, 23 Jan 2024 13:20:57 GMT
- Title: Correlation-Embedded Transformer Tracking: A Single-Branch Framework
- Authors: Fei Xie, Wankou Yang, Chunyu Wang, Lei Chu, Yue Cao, Chao Ma, Wenjun
Zeng
- Abstract summary: We propose a novel single-branch tracking framework inspired by the transformer.
Unlike the Siamese-like feature extraction, our tracker deeply embeds cross-image feature correlation in multiple layers of the feature network.
The output features can be directly used for predicting target locations without additional correlation steps.
- Score: 72.54388547501499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing robust and discriminative appearance models has been a
long-standing research challenge in visual object tracking. In the prevalent
Siamese-based paradigm, the features extracted by the Siamese-like networks are
often insufficient to model the tracked targets and distractor objects, thereby
hindering them from being robust and discriminative simultaneously. While most
Siamese trackers focus on designing robust correlation operations, we propose a
novel single-branch tracking framework inspired by the transformer. Unlike the
Siamese-like feature extraction, our tracker deeply embeds cross-image feature
correlation in multiple layers of the feature network. By extensively matching
the features of the two images through multiple layers, it can suppress
non-target features, resulting in target-aware feature extraction. The output
features can be directly used for predicting target locations without
additional correlation steps. Thus, we reformulate the two-branch Siamese
tracking as a conceptually simple, fully transformer-based Single-Branch
Tracking pipeline, dubbed SBT. After conducting an in-depth analysis of the SBT
baseline, we summarize many effective design principles and propose an improved
tracker dubbed SuperSBT. SuperSBT adopts a hierarchical architecture with a
local modeling layer to enhance shallow-level features. A unified relation
modeling is proposed to remove complex handcrafted layer pattern designs.
SuperSBT is further improved by masked image modeling pre-training, integrating
temporal modeling, and equipping with dedicated prediction heads. Thus,
SuperSBT outperforms the SBT baseline by 4.7%,3.0%, and 4.5% AUC scores in
LaSOT, TrackingNet, and GOT-10K. Notably, SuperSBT greatly raises the speed of
SBT from 37 FPS to 81 FPS. Extensive experiments show that our method achieves
superior results on eight VOT benchmarks.
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - BACTrack: Building Appearance Collection for Aerial Tracking [13.785254511683966]
Building Appearance Collection Tracking builds a dynamic collection of target templates online and performs efficient multi-template matching to achieve robust tracking.
BACTrack achieves top performance on four challenging aerial tracking benchmarks while maintaining an impressive speed of over 87 FPS on a single GPU.
arXiv Detail & Related papers (2023-12-11T05:55:59Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture.
We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor.
Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z) - Multiple Convolutional Features in Siamese Networks for Object Tracking [13.850110645060116]
Multiple Features-Siamese Tracker (MFST) is a novel tracking algorithm exploiting several hierarchical feature maps for robust tracking.
MFST achieves high tracking accuracy, while outperforming the standard siamese tracker on object tracking benchmarks.
arXiv Detail & Related papers (2021-03-01T08:02:27Z) - MFST: Multi-Features Siamese Tracker [13.850110645060116]
Multi-Features Siamese Tracker (MFST) is a novel tracking algorithm exploiting several hierarchical feature maps for robust deep similarity tracking.
MFST achieves high tracking accuracy, while outperforming several state-of-the-art trackers, including standard Siamese trackers.
arXiv Detail & Related papers (2021-03-01T07:18:32Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.