Related papers: SparseTT: Visual Tracking with Sparse Transformers

SparseTT: Visual Tracking with Sparse Transformers

URL: http://arxiv.org/abs/2205.03776v1
Date: Sun, 8 May 2022 04:00:28 GMT
Title: SparseTT: Visual Tracking with Sparse Transformers
Authors: Zhihong Fu, Zehua Fu, Qingjie Liu, Wenrui Cai, Yunhong Wang
Abstract summary: Self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers. In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions. We introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes.
Score: 43.1666514605021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have been successfully applied to the visual tracking task and significantly promote tracking performance. The self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers. However, self-attention lacks focusing on the most relevant information in the search regions, making it easy to be distracted by background. In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions, which enables a much accurate tracking. Furthermore, we introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes, which further improve the tracking performance. Extensive experiments show that, without bells and whistles, our method significantly outperforms the state-of-the-art approaches on LaSOT, GOT-10k, TrackingNet, and UAV123, while running at 40 FPS. Notably, the training time of our method is reduced by 75% compared to that of TransT. The source code and models are available at https://github.com/fzh0917/SparseTT.

Related papers

Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking. Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z)
Compact Transformer Tracker with Correlative Masked Modeling [16.234426179567837]
Transformer framework has been showing superior performances in visual object tracking. Recent advances focus on exploring attention mechanism variants for better information aggregation. In this paper, we prove that the vanilla self-attention structure is sufficient for information aggregation.
arXiv Detail & Related papers (2023-01-26T04:58:08Z)
Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations [0.6144680854063939]
Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. In the task of Multiple Object Tracking (MOT), leveraging the power of Transformers remains relatively unexplored. Among the pioneering efforts in this domain, TransCenter, a Transformer-based MOT architecture with dense object queries, demonstrated exceptional tracking capabilities while maintaining reasonable runtime. We propose a post-processing mechanism grounded in the Track-by-Detection paradigm, aiming to refine the track displacement estimation.
arXiv Detail & Related papers (2022-10-24T19:47:58Z)
AiATrack: Attention in Attention for Transformer Visual Tracking [89.94386868729332]
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. We propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking.
arXiv Detail & Related papers (2022-07-20T00:44:03Z)
Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z)
TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture. We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z)
STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance. We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)
Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers. This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.