SparseTT: Visual Tracking with Sparse Transformers
- URL: http://arxiv.org/abs/2205.03776v1
- Date: Sun, 8 May 2022 04:00:28 GMT
- Title: SparseTT: Visual Tracking with Sparse Transformers
- Authors: Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, Yunhong Wang
- Abstract summary: Self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers.
In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions.
We introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes.
- Score: 43.1666514605021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have been successfully applied to the visual tracking task and
significantly promote tracking performance. The self-attention mechanism
designed to model long-range dependencies is the key to the success of
Transformers. However, self-attention lacks focusing on the most relevant
information in the search regions, making it easy to be distracted by
background. In this paper, we relieve this issue with a sparse attention
mechanism by focusing the most relevant information in the search regions,
which enables a much accurate tracking. Furthermore, we introduce a double-head
predictor to boost the accuracy of foreground-background classification and
regression of target bounding boxes, which further improve the tracking
performance. Extensive experiments show that, without bells and whistles, our
method significantly outperforms the state-of-the-art approaches on LaSOT,
GOT-10k, TrackingNet, and UAV123, while running at 40 FPS. Notably, the
training time of our method is reduced by 75% compared to that of TransT. The
source code and models are available at https://github.com/fzh0917/SparseTT.
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking.
Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z) - Compact Transformer Tracker with Correlative Masked Modeling [16.234426179567837]
Transformer framework has been showing superior performances in visual object tracking.
Recent advances focus on exploring attention mechanism variants for better information aggregation.
In this paper, we prove that the vanilla self-attention structure is sufficient for information aggregation.
arXiv Detail & Related papers (2023-01-26T04:58:08Z) - AiATrack: Attention in Attention for Transformer Visual Tracking [89.94386868729332]
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role.
We propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors.
Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking.
arXiv Detail & Related papers (2022-07-20T00:44:03Z) - Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z) - TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture.
We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor.
Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z) - STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance.
We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target.
Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z) - Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers.
This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention.
Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.