Transformers in Single Object Tracking: An Experimental Survey
- URL: http://arxiv.org/abs/2302.11867v3
- Date: Fri, 23 Jun 2023 08:26:41 GMT
- Title: Transformers in Single Object Tracking: An Experimental Survey
- Authors: Janani Thangavel, Thanikasalam Kokul, Amirthalingam Ramanan, and Subha
Fernando
- Abstract summary: Transformer-based tracking approaches have ushered in a new era in single-object tracking.
We conduct an in-depth literature analysis of Transformer tracking approaches by categorizing them into CNN-Transformer based trackers, Two-stream Two-stage fully-Transformer based trackers, and One-stream One-stage fully-Transformer based trackers.
- Score: 1.2526963688768458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single-object tracking is a well-known and challenging research topic in
computer vision. Over the last two decades, numerous researchers have proposed
various algorithms to solve this problem and achieved promising results.
Recently, Transformer-based tracking approaches have ushered in a new era in
single-object tracking by introducing new perspectives and achieving superior
tracking robustness. In this paper, we conduct an in-depth literature analysis
of Transformer tracking approaches by categorizing them into CNN-Transformer
based trackers, Two-stream Two-stage fully-Transformer based trackers, and
One-stream One-stage fully-Transformer based trackers. In addition, we conduct
experimental evaluations to assess their tracking robustness and computational
efficiency using publicly available benchmark datasets. Furthermore, we measure
their performances on different tracking scenarios to identify their strengths
and weaknesses in particular situations. Our survey provides insights into the
underlying principles of Transformer tracking approaches, the challenges they
encounter, and the future directions they may take.
Related papers
- SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects [2.9803250365852443]
This paper addresses the problem of multi-object tracking in Unmanned Aerial Vehicle (UAV) footage.
It plays a critical role in various UAV applications, including traffic monitoring systems and real-time suspect tracking by the police.
We propose a new tracking strategy, which initiates the tracking of target objects from low-confidence detections.
arXiv Detail & Related papers (2024-10-26T05:09:20Z) - Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers [18.615714086028632]
New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks.
This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change.
arXiv Detail & Related papers (2024-06-03T20:13:38Z) - AViTMP: A Tracking-Specific Transformer for Single-Branch Visual Tracking [17.133735660335343]
We propose an Adaptive ViT Model Prediction tracker (AViTMP) to design a customised tracking method.
This method bridges the single-branch network with discriminative models for the first time.
We show that AViTMP achieves state-of-the-art performance, especially in terms of long-term tracking and robustness.
arXiv Detail & Related papers (2023-10-30T13:48:04Z) - Leveraging the Power of Data Augmentation for Transformer-based Tracking [64.46371987827312]
We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
arXiv Detail & Related papers (2023-09-15T09:18:54Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Dynamic Attention guided Multi-Trajectory Analysis for Single Object
Tracking [62.13213518417047]
We propose to introduce more dynamics by devising a dynamic attention-guided multi-trajectory tracking strategy.
In particular, we construct dynamic appearance model that contains multiple target templates, each of which provides its own attention for locating the target in the new frame.
After spanning the whole sequence, we introduce a multi-trajectory selection network to find the best trajectory that delivers improved tracking performance.
arXiv Detail & Related papers (2021-03-30T05:36:31Z) - Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual
Tracking [47.205979159070445]
We bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking.
Different from classic usage of the transformer in natural language processing tasks, we separate its encoder and decoder into two parallel branches.
Our method sets several new state-of-the-art records on prevalent tracking benchmarks.
arXiv Detail & Related papers (2021-03-22T09:20:05Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z) - Multi-modal Visual Tracking: Review and Experimental Comparison [85.20414397784937]
We summarize the multi-modal tracking algorithms, especially visible-depth (RGB-D) tracking and visible-thermal (RGB-T) tracking.
We conduct experiments to analyze the effectiveness of trackers on five datasets.
arXiv Detail & Related papers (2020-12-08T02:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.