AiATrack: Attention in Attention for Transformer Visual Tracking
- URL: http://arxiv.org/abs/2207.09603v2
- Date: Fri, 22 Jul 2022 13:26:14 GMT
- Title: AiATrack: Attention in Attention for Transformer Visual Tracking
- Authors: Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, Junsong Yuan
- Abstract summary: Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role.
We propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors.
Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking.
- Score: 89.94386868729332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer trackers have achieved impressive advancements recently, where
the attention mechanism plays an important role. However, the independent
correlation computation in the attention mechanism could result in noisy and
ambiguous attention weights, which inhibits further performance improvement. To
address this issue, we propose an attention in attention (AiA) module, which
enhances appropriate correlations and suppresses erroneous ones by seeking
consensus among all correlation vectors. Our AiA module can be readily applied
to both self-attention blocks and cross-attention blocks to facilitate feature
aggregation and information propagation for visual tracking. Moreover, we
propose a streamlined Transformer tracking framework, dubbed AiATrack, by
introducing efficient feature reuse and target-background embeddings to make
full use of temporal references. Experiments show that our tracker achieves
state-of-the-art performance on six tracking benchmarks while running at a
real-time speed.
Related papers
- Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting [47.58016750718323]
We propose a novel generative self-supervised method called TimeDART.
TimeDART captures both the global sequence dependence and local detail features within time series data.
Our code is publicly available at https://github.com/Melmaphother/TimeDART.
arXiv Detail & Related papers (2024-10-08T06:08:33Z) - Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking [14.382072224997074]
Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness.
We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking.
We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
arXiv Detail & Related papers (2024-07-07T14:10:04Z) - Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking.
Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z) - Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z) - SparseTT: Visual Tracking with Sparse Transformers [43.1666514605021]
Self-attention mechanism designed to model long-range dependencies is the key to the success of Transformers.
In this paper, we relieve this issue with a sparse attention mechanism by focusing the most relevant information in the search regions.
We introduce a double-head predictor to boost the accuracy of foreground-background classification and regression of target bounding boxes.
arXiv Detail & Related papers (2022-05-08T04:00:28Z) - Context-aware Visual Tracking with Joint Meta-updating [11.226947525556813]
We propose a context-aware tracking model to optimize the tracker over the representation space, which jointly meta-update both branches by exploiting information along the whole sequence.
The proposed tracking method achieves an EAO score of 0.514 on VOT2018 with the speed of 40FPS, demonstrating its capability of improving the accuracy and robustness of the underlying tracker with little speed drop.
arXiv Detail & Related papers (2022-04-04T14:16:00Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.