Efficient Visual Tracking with Exemplar Transformers
- URL: http://arxiv.org/abs/2112.09686v1
- Date: Fri, 17 Dec 2021 18:57:54 GMT
- Title: Efficient Visual Tracking with Exemplar Transformers
- Authors: Philippe Blatter, Menelaos Kanakis, Martin Danelljan, Luc Van Gool
- Abstract summary: We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
- Score: 98.62550635320514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The design of more complex and powerful neural network models has
significantly advanced the state-of-the-art in visual object tracking. These
advances can be attributed to deeper networks, or to the introduction of new
building blocks, such as transformers. However, in the pursuit of increased
tracking performance, efficient tracking architectures have received
surprisingly little attention. In this paper, we introduce the Exemplar
Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers,
runs at 47 fps on a CPU. This is up to 8 times faster than other
transformer-based models, making it the only real-time transformer-based
tracker. When compared to lightweight trackers that can operate in real-time on
standard CPUs, E.T.Track consistently outperforms all other methods on the
LaSOT, OTB-100, NFS, TrackingNet and VOT-ST2020 datasets. The code will soon be
released on https://github.com/visionml/pytracking.
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Mobile Vision Transformer-based Visual Object Tracking [3.9160947065896803]
We propose a lightweight, accurate, and fast tracking algorithm using MobileViT as the backbone for the first time.
Our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU.
arXiv Detail & Related papers (2023-09-11T21:16:41Z) - Separable Self and Mixed Attention Transformers for Efficient Object
Tracking [3.9160947065896803]
This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking.
With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time.
Simulations show that our Separable Self and Mixed Attention-based Tracker, SMAT, surpasses the performance of related lightweight trackers on GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets.
arXiv Detail & Related papers (2023-09-07T19:23:02Z) - Efficient Visual Tracking via Hierarchical Cross-Attention Transformer [82.92565582642847]
We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT.
Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
arXiv Detail & Related papers (2022-03-25T09:45:27Z) - Global Tracking Transformers [76.58184022651596]
We present a novel transformer-based architecture for global multi-object tracking.
The core component is a global tracking transformer that operates on objects from all frames in the sequence.
Our framework seamlessly integrates into state-of-the-art large-vocabulary detectors to track any objects.
arXiv Detail & Related papers (2022-03-24T17:58:04Z) - SwinTrack: A Simple and Strong Baseline for Transformer Tracking [81.65306568735335]
We propose a fully attentional-based Transformer tracking algorithm, Swin-Transformer Tracker (SwinTrack)
SwinTrack uses Transformer for both feature extraction and feature fusion, allowing full interactions between the target object and the search region for tracking.
In our thorough experiments, SwinTrack sets a new record with 0.717 SUC on LaSOT, surpassing STARK by 4.6% while still running at 45 FPS.
arXiv Detail & Related papers (2021-12-02T05:56:03Z) - Siamese Transformer Pyramid Networks for Real-Time UAV Tracking [3.0969191504482243]
We introduce the Siamese Transformer Pyramid Network (SiamTPN), which inherits the advantages from both CNN and Transformer architectures.
Experiments on both aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed.
Our fastest variant tracker operates over 30 Hz on a single CPU-core and obtaining an AUC score of 58.1% on the LaSOT dataset.
arXiv Detail & Related papers (2021-10-17T13:48:31Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.