Related papers: Mobile Vision Transformer-based Visual Object Tracking

Mobile Vision Transformer-based Visual Object Tracking

URL: http://arxiv.org/abs/2309.05829v1
Date: Mon, 11 Sep 2023 21:16:41 GMT
Title: Mobile Vision Transformer-based Visual Object Tracking
Authors: Goutam Yelluru Gopal, Maria A. Amer
Abstract summary: We propose a lightweight, accurate, and fast tracking algorithm using MobileViT as the backbone for the first time. Our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU.
Score: 3.9160947065896803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The introduction of robust backbones, such as Vision Transformers, has improved the performance of object tracking algorithms in recent years. However, these state-of-the-art trackers are computationally expensive since they have a large number of model parameters and rely on specialized hardware (e.g., GPU) for faster inference. On the other hand, recent lightweight trackers are fast but are less accurate, especially on large-scale datasets. We propose a lightweight, accurate, and fast tracking algorithm using Mobile Vision Transformers (MobileViT) as the backbone for the first time. We also present a novel approach of fusing the template and search region representations in the MobileViT backbone, thereby generating superior feature encoding for target localization. The experimental results show that our MobileViT-based Tracker, MVT, surpasses the performance of recent lightweight trackers on the large-scale datasets GOT10k and TrackingNet, and with a high inference speed. In addition, our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU. The tracker code and models are available at https://github.com/goutamyg/MVT

Related papers

Lightweight RGB-T Tracking with Mobile Vision Transformers [2.209921757303168]
We propose a novel lightweight RGB-T tracking algorithm based on Mobile Vision Transformers (MobileViT)<n>Compared to state-of-the-art efficient multimodal trackers, our model achieves comparable accuracy while offering significantly lower parameter counts.<n>This paper is the first to propose a tracker using Mobile Vision Transformers for RGB-T tracking and multimodal tracking at large.
arXiv Detail & Related papers (2025-06-23T21:46:22Z)
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking [54.124445709376154]
We propose a novel asymmetric Siamese tracker named textbfAsymTrack for efficient tracking. Building on this architecture, we devise an efficient template modulation mechanism to inject crucial cues into the search features. Experiments demonstrate that AsymTrack offers superior speed-precision trade-offs across different platforms.
arXiv Detail & Related papers (2025-03-01T14:44:54Z)
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources. The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency. We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z)
ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual Tracking [40.13014036490452]
transformer has enabled the speed-oriented trackers to approach state-of-the-art (SOTA) performance with high-speed. We demonstrate that it is possible to narrow or even close this gap while achieving high tracking speed based on the smaller input size.
arXiv Detail & Related papers (2023-10-16T05:06:13Z)
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking [4.179339279095506]
LiteTrack is an efficient transformer-based tracking model optimized for high-speed operations across various devices. It achieves a more favorable trade-off between accuracy and efficiency than the other lightweight trackers. LiteTrack-B9 reaches competitive 72.2% AO on GOT-10k and 82.4% AUC on TrackingNet, and operates at 171 fps on an NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2023-09-17T12:01:03Z)
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z)
Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices. HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z)
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer [82.92565582642847]
We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
arXiv Detail & Related papers (2022-03-25T09:45:27Z)
Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z)
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search [104.84999119090887]
We present LightTrack, which uses neural architecture search (NAS) to design more lightweight and efficient object trackers. Comprehensive experiments show that our LightTrack is effective. It can find trackers that achieve superior performance compared to handcrafted SOTA trackers, such as SiamRPN++ and Ocean.
arXiv Detail & Related papers (2021-04-29T17:55:24Z)
STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance. We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.