Separable Self and Mixed Attention Transformers for Efficient Object
Tracking
- URL: http://arxiv.org/abs/2309.03979v1
- Date: Thu, 7 Sep 2023 19:23:02 GMT
- Title: Separable Self and Mixed Attention Transformers for Efficient Object
Tracking
- Authors: Goutam Yelluru Gopal, Maria A. Amer
- Abstract summary: This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking.
With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time.
Simulations show that our Separable Self and Mixed Attention-based Tracker, SMAT, surpasses the performance of related lightweight trackers on GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets.
- Score: 3.9160947065896803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The deployment of transformers for visual object tracking has shown
state-of-the-art results on several benchmarks. However, the transformer-based
models are under-utilized for Siamese lightweight tracking due to the
computational complexity of their attention blocks. This paper proposes an
efficient self and mixed attention transformer-based architecture for
lightweight tracking. The proposed backbone utilizes the separable mixed
attention transformers to fuse the template and search regions during feature
extraction to generate superior feature encoding. Our prediction head performs
global contextual modeling of the encoded features by leveraging efficient
self-attention blocks for robust target state estimation. With these
contributions, the proposed lightweight tracker deploys a transformer-based
backbone and head module concurrently for the first time. Our ablation study
testifies to the effectiveness of the proposed combination of backbone and head
modules. Simulations show that our Separable Self and Mixed Attention-based
Tracker, SMAT, surpasses the performance of related lightweight trackers on
GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets, while running at
37 fps on CPU, 158 fps on GPU, and having 3.8M parameters. For example, it
significantly surpasses the closely related trackers E.T.Track and
MixFormerV2-S on GOT10k-test by a margin of 7.9% and 5.8%, respectively, in the
AO metric. The tracker code and model is available at
https://github.com/goutamyg/SMAT
Related papers
- Mamba-FETrack: Frame-Event Tracking via State Space Model [14.610806117193116]
This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM)
Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams.
Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker.
arXiv Detail & Related papers (2024-04-28T13:12:49Z) - Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance [87.19164603145056]
We propose LoRAT, a method that unveils the power of large ViT model for tracking within laboratory-level resources.
The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency.
We design an anchor-free head solely based on to adapt PETR, enabling better performance with less computational overhead.
arXiv Detail & Related papers (2024-03-08T11:41:48Z) - MixFormerV2: Efficient Fully Transformer Tracking [49.07428299165031]
Transformer-based trackers have achieved strong accuracy on the standard benchmarks.
But their efficiency remains an obstacle to practical deployment on both GPU and CPU platforms.
We propose a fully transformer tracking framework, coined as emphMixFormerV2, without any dense convolutional operation and complex score prediction module.
arXiv Detail & Related papers (2023-05-25T09:50:54Z) - Compact Transformer Tracker with Correlative Masked Modeling [16.234426179567837]
Transformer framework has been showing superior performances in visual object tracking.
Recent advances focus on exploring attention mechanism variants for better information aggregation.
In this paper, we prove that the vanilla self-attention structure is sufficient for information aggregation.
arXiv Detail & Related papers (2023-01-26T04:58:08Z) - Strong-TransCenter: Improved Multi-Object Tracking based on Transformers
with Dense Representations [1.2891210250935146]
TransCenter is a transformer-based MOT architecture with dense object queries for accurately tracking all the objects.
This paper shows an improvement to this tracker using post processing mechanism based in the Track-by-Detection paradigm.
Our new tracker shows significant improvements in the IDF1 and HOTA metrics and comparable results on the MOTA metric.
arXiv Detail & Related papers (2022-10-24T19:47:58Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z) - Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers.
This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention.
Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z) - Simultaneous Detection and Tracking with Motion Modelling for Multiple
Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association.
DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster.
We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.