Related papers: Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

URL: http://arxiv.org/abs/2203.13537v1
Date: Fri, 25 Mar 2022 09:45:27 GMT
Title: Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
Authors: Xin Chen, Dong Wang, Dongdong Li, Huchuan Lu
Abstract summary: We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
Score: 82.92565582642847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, target tracking has made great progress in accuracy. This development is mainly attributed to powerful networks (such as transformers) and additional modules (such as online update and refinement modules). However, less attention has been paid to tracking speed. Most state-of-the-art trackers are satisfied with the real-time speed on powerful GPUs. However, practical applications necessitate higher requirements for tracking speed, especially when edge platforms with limited resources are used. In this work, we present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier. Experiments show that our HCAT achieves promising results on LaSOT, GOT-10k, TrackingNet, NFS, OTB100, UAV123, and VOT2020. Code and models are available at https://github.com/chenxin-dlut/HCAT.

Related papers

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking [49.07982079554859]
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities.<n>However, their practicality is limited on resource-constrained devices because of their slow processing speeds.<n>We present HiT, a novel family of efficient tracking models that achieve high performance while maintaining fast operation across various devices.
arXiv Detail & Related papers (2025-06-25T12:46:46Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources. The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency. We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z)
Mobile Vision Transformer-based Visual Object Tracking [3.9160947065896803]
We propose a lightweight, accurate, and fast tracking algorithm using MobileViT as the backbone for the first time. Our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU.
arXiv Detail & Related papers (2023-09-11T21:16:41Z)
Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices. HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z)
Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z)
SwinTrack: A Simple and Strong Baseline for Transformer Tracking [81.65306568735335]
We propose a fully attentional-based Transformer tracking algorithm, Swin-Transformer Tracker (SwinTrack) SwinTrack uses Transformer for both feature extraction and feature fusion, allowing full interactions between the target object and the search region for tracking. In our thorough experiments, SwinTrack sets a new record with 0.717 SUC on LaSOT, surpassing STARK by 4.6% while still running at 45 FPS.
arXiv Detail & Related papers (2021-12-02T05:56:03Z)
Siamese Transformer Pyramid Networks for Real-Time UAV Tracking [3.0969191504482243]
We introduce the Siamese Transformer Pyramid Network (SiamTPN), which inherits the advantages from both CNN and Transformer architectures. Experiments on both aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed. Our fastest variant tracker operates over 30 Hz on a single CPU-core and obtaining an AUC score of 58.1% on the LaSOT dataset.
arXiv Detail & Related papers (2021-10-17T13:48:31Z)
STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance. We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.