ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual
Tracking
- URL: http://arxiv.org/abs/2310.10071v1
- Date: Mon, 16 Oct 2023 05:06:13 GMT
- Title: ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual
Tracking
- Authors: Yutong Kou, Jin Gao, Bing Li, Gang Wang, Weiming Hu, Yizheng Wang and
Liang Li
- Abstract summary: transformer has enabled the speed-oriented trackers to approach state-of-the-art (SOTA) performance with high-speed.
We demonstrate that it is possible to narrow or even close this gap while achieving high tracking speed based on the smaller input size.
- Score: 40.13014036490452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the transformer has enabled the speed-oriented trackers to approach
state-of-the-art (SOTA) performance with high-speed thanks to the smaller input
size or the lighter feature extraction backbone, though they still
substantially lag behind their corresponding performance-oriented versions. In
this paper, we demonstrate that it is possible to narrow or even close this gap
while achieving high tracking speed based on the smaller input size. To this
end, we non-uniformly resize the cropped image to have a smaller input size
while the resolution of the area where the target is more likely to appear is
higher and vice versa. This enables us to solve the dilemma of attending to a
larger visual field while retaining more raw information for the target despite
a smaller input size. Our formulation for the non-uniform resizing can be
efficiently solved through quadratic programming (QP) and naturally integrated
into most of the crop-based local trackers. Comprehensive experiments on five
challenging datasets based on two kinds of transformer trackers, \ie, OSTrack
and TransT, demonstrate consistent improvements over them. In particular,
applying our method to the speed-oriented version of OSTrack even outperforms
its performance-oriented counterpart by 0.6% AUC on TNL2K, while running 50%
faster and saving over 55% MACs. Codes and models are available at
https://github.com/Kou-99/ZoomTrack.
Related papers
- Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources.
The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency.
We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z) - Mobile Vision Transformer-based Visual Object Tracking [3.9160947065896803]
We propose a lightweight, accurate, and fast tracking algorithm using MobileViT as the backbone for the first time.
Our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU.
arXiv Detail & Related papers (2023-09-11T21:16:41Z) - Exploring Lightweight Hierarchical Vision Transformers for Efficient
Visual Tracking [69.89887818921825]
HiT is a new family of efficient tracking models that can run at high speed on different devices.
HiT achieves 64.6% AUC on the LaSOT benchmark, surpassing all previous efficient trackers.
arXiv Detail & Related papers (2023-08-14T02:51:34Z) - Efficient Visual Tracking via Hierarchical Cross-Attention Transformer [82.92565582642847]
We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT.
Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
arXiv Detail & Related papers (2022-03-25T09:45:27Z) - DeepScale: An Online Frame Size Adaptation Framework to Accelerate
Visual Multi-object Tracking [8.878656943106934]
DeepScale is a model agnostic frame size selection approach to accelerate tracking throughput.
It can find a suitable trade-off between tracking accuracy and speed by adapting frame sizes at run time.
Compared to a state-of-the-art tracker, DeepScale++, a variant of DeepScale achieves 1.57X accelerated with only moderate degradation.
arXiv Detail & Related papers (2021-07-22T00:12:58Z) - STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance.
We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target.
Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.