Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
- URL: http://arxiv.org/abs/2203.13537v1
- Date: Fri, 25 Mar 2022 09:45:27 GMT
- Title: Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
- Authors: Xin Chen, Dong Wang, Dongdong Li, Huchuan Lu
- Abstract summary: We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT.
Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
- Score: 82.92565582642847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, target tracking has made great progress in accuracy. This
development is mainly attributed to powerful networks (such as transformers)
and additional modules (such as online update and refinement modules). However,
less attention has been paid to tracking speed. Most state-of-the-art trackers
are satisfied with the real-time speed on powerful GPUs. However, practical
applications necessitate higher requirements for tracking speed, especially
when edge platforms with limited resources are used. In this work, we present
an efficient tracking method via a hierarchical cross-attention transformer
named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on
the edge AI platform of NVidia Jetson AGX Xavier. Experiments show that our
HCAT achieves promising results on LaSOT, GOT-10k, TrackingNet, NFS, OTB100,
UAV123, and VOT2020. Code and models are available at
https://github.com/chenxin-dlut/HCAT.
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources.
The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency.
We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z) - Mobile Vision Transformer-based Visual Object Tracking [3.9160947065896803]
We propose a lightweight, accurate, and fast tracking algorithm using MobileViT as the backbone for the first time.
Our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU.
arXiv Detail & Related papers (2023-09-11T21:16:41Z) - Exploring Lightweight Hierarchical Vision Transformers for Efficient
Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices.
HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z) - Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z) - SwinTrack: A Simple and Strong Baseline for Transformer Tracking [81.65306568735335]
We propose a fully attentional-based Transformer tracking algorithm, Swin-Transformer Tracker (SwinTrack)
SwinTrack uses Transformer for both feature extraction and feature fusion, allowing full interactions between the target object and the search region for tracking.
In our thorough experiments, SwinTrack sets a new record with 0.717 SUC on LaSOT, surpassing STARK by 4.6% while still running at 45 FPS.
arXiv Detail & Related papers (2021-12-02T05:56:03Z) - Siamese Transformer Pyramid Networks for Real-Time UAV Tracking [3.0969191504482243]
We introduce the Siamese Transformer Pyramid Network (SiamTPN), which inherits the advantages from both CNN and Transformer architectures.
Experiments on both aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed.
Our fastest variant tracker operates over 30 Hz on a single CPU-core and obtaining an AUC score of 58.1% on the LaSOT dataset.
arXiv Detail & Related papers (2021-10-17T13:48:31Z) - STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance.
We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target.
Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.