Mobile Vision Transformer-based Visual Object Tracking
- URL: http://arxiv.org/abs/2309.05829v1
- Date: Mon, 11 Sep 2023 21:16:41 GMT
- Title: Mobile Vision Transformer-based Visual Object Tracking
- Authors: Goutam Yelluru Gopal, Maria A. Amer
- Abstract summary: We propose a lightweight, accurate, and fast tracking algorithm using MobileViT as the backbone for the first time.
Our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU.
- Score: 3.9160947065896803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The introduction of robust backbones, such as Vision Transformers, has
improved the performance of object tracking algorithms in recent years.
However, these state-of-the-art trackers are computationally expensive since
they have a large number of model parameters and rely on specialized hardware
(e.g., GPU) for faster inference. On the other hand, recent lightweight
trackers are fast but are less accurate, especially on large-scale datasets. We
propose a lightweight, accurate, and fast tracking algorithm using Mobile
Vision Transformers (MobileViT) as the backbone for the first time. We also
present a novel approach of fusing the template and search region
representations in the MobileViT backbone, thereby generating superior feature
encoding for target localization. The experimental results show that our
MobileViT-based Tracker, MVT, surpasses the performance of recent lightweight
trackers on the large-scale datasets GOT10k and TrackingNet, and with a high
inference speed. In addition, our method outperforms the popular DiMP-50
tracker despite having 4.7 times fewer model parameters and running at 2.8
times its speed on a GPU. The tracker code and models are available at
https://github.com/goutamyg/MVT
Related papers
- Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources.
The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency.
We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z) - ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual
Tracking [40.13014036490452]
transformer has enabled the speed-oriented trackers to approach state-of-the-art (SOTA) performance with high-speed.
We demonstrate that it is possible to narrow or even close this gap while achieving high tracking speed based on the smaller input size.
arXiv Detail & Related papers (2023-10-16T05:06:13Z) - LiteTrack: Layer Pruning with Asynchronous Feature Extraction for
Lightweight and Efficient Visual Tracking [4.179339279095506]
LiteTrack is an efficient transformer-based tracking model optimized for high-speed operations across various devices.
It achieves a more favorable trade-off between accuracy and efficiency than the other lightweight trackers.
LiteTrack-B9 reaches competitive 72.2% AO on GOT-10k and 82.4% AUC on TrackingNet, and operates at 171 fps on an NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2023-09-17T12:01:03Z) - BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving.
In this paper, we propose BEVTrack, a simple yet effective baseline method.
By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z) - Exploring Lightweight Hierarchical Vision Transformers for Efficient
Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices.
HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z) - Efficient Visual Tracking via Hierarchical Cross-Attention Transformer [82.92565582642847]
We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT.
Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
arXiv Detail & Related papers (2022-03-25T09:45:27Z) - Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z) - LightTrack: Finding Lightweight Neural Networks for Object Tracking via
One-Shot Architecture Search [104.84999119090887]
We present LightTrack, which uses neural architecture search (NAS) to design more lightweight and efficient object trackers.
Comprehensive experiments show that our LightTrack is effective.
It can find trackers that achieve superior performance compared to handcrafted SOTA trackers, such as SiamRPN++ and Ocean.
arXiv Detail & Related papers (2021-04-29T17:55:24Z) - STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance.
We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target.
Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.