Correlation-Embedded Transformer Tracking: A Single-Branch Framework
- URL: http://arxiv.org/abs/2401.12743v2
- Date: Tue, 3 Sep 2024 06:13:37 GMT
- Title: Correlation-Embedded Transformer Tracking: A Single-Branch Framework
- Authors: Fei Xie, Wankou Yang, Chunyu Wang, Lei Chu, Yue Cao, Chao Ma, Wenjun Zeng,
- Abstract summary: We propose a novel single-branch tracking framework inspired by the transformer.
Unlike the Siamese-like feature extraction, our tracker deeply embeds cross-image feature correlation in multiple layers of the feature network.
The output features can be directly used for predicting target locations without additional correlation steps.
- Score: 69.0798277313574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing robust and discriminative appearance models has been a long-standing research challenge in visual object tracking. In the prevalent Siamese-based paradigm, the features extracted by the Siamese-like networks are often insufficient to model the tracked targets and distractor objects, thereby hindering them from being robust and discriminative simultaneously. While most Siamese trackers focus on designing robust correlation operations, we propose a novel single-branch tracking framework inspired by the transformer. Unlike the Siamese-like feature extraction, our tracker deeply embeds cross-image feature correlation in multiple layers of the feature network. By extensively matching the features of the two images through multiple layers, it can suppress non-target features, resulting in target-aware feature extraction. The output features can be directly used for predicting target locations without additional correlation steps. Thus, we reformulate the two-branch Siamese tracking as a conceptually simple, fully transformer-based Single-Branch Tracking pipeline, dubbed SBT. After conducting an in-depth analysis of the SBT baseline, we summarize many effective design principles and propose an improved tracker dubbed SuperSBT. SuperSBT adopts a hierarchical architecture with a local modeling layer to enhance shallow-level features. A unified relation modeling is proposed to remove complex handcrafted layer pattern designs. SuperSBT is further improved by masked image modeling pre-training, integrating temporal modeling, and equipping with dedicated prediction heads. Thus, SuperSBT outperforms the SBT baseline by 4.7%,3.0%, and 4.5% AUC scores in LaSOT, TrackingNet, and GOT-10K. Notably, SuperSBT greatly raises the speed of SBT from 37 FPS to 81 FPS. Extensive experiments show that our method achieves superior results on eight VOT benchmarks.
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - BACTrack: Building Appearance Collection for Aerial Tracking [13.785254511683966]
Building Appearance Collection Tracking builds a dynamic collection of target templates online and performs efficient multi-template matching to achieve robust tracking.
BACTrack achieves top performance on four challenging aerial tracking benchmarks while maintaining an impressive speed of over 87 FPS on a single GPU.
arXiv Detail & Related papers (2023-12-11T05:55:59Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture.
We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor.
Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z) - Multiple Convolutional Features in Siamese Networks for Object Tracking [13.850110645060116]
Multiple Features-Siamese Tracker (MFST) is a novel tracking algorithm exploiting several hierarchical feature maps for robust tracking.
MFST achieves high tracking accuracy, while outperforming the standard siamese tracker on object tracking benchmarks.
arXiv Detail & Related papers (2021-03-01T08:02:27Z) - MFST: Multi-Features Siamese Tracker [13.850110645060116]
Multi-Features Siamese Tracker (MFST) is a novel tracking algorithm exploiting several hierarchical feature maps for robust deep similarity tracking.
MFST achieves high tracking accuracy, while outperforming several state-of-the-art trackers, including standard Siamese trackers.
arXiv Detail & Related papers (2021-03-01T07:18:32Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.