Leveraging the Power of Data Augmentation for Transformer-based Tracking
- URL: http://arxiv.org/abs/2309.08264v1
- Date: Fri, 15 Sep 2023 09:18:54 GMT
- Title: Leveraging the Power of Data Augmentation for Transformer-based Tracking
- Authors: Jie Zhao, Johan Edstedt, Michael Felsberg, Dong Wang, Huchuan Lu
- Abstract summary: We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
- Score: 64.46371987827312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to long-distance correlation and powerful pretrained models,
transformer-based methods have initiated a breakthrough in visual object
tracking performance. Previous works focus on designing effective architectures
suited for tracking, but ignore that data augmentation is equally crucial for
training a well-performing model. In this paper, we first explore the impact of
general data augmentations on transformer-based trackers via systematic
experiments, and reveal the limited effectiveness of these common strategies.
Motivated by experimental observations, we then propose two data augmentation
methods customized for tracking. First, we optimize existing random cropping
via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which
enables the model against challenges like background interference. Extensive
experiments on two transformer-based trackers and six benchmarks demonstrate
the effectiveness and data efficiency of our methods, especially under
challenging settings, like one-shot tracking and small image resolutions.
Related papers
- A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Simple In-place Data Augmentation for Surveillance Object Detection [2.3841361713768077]
We propose a straightforward augmentation technique tailored for object detection datasets.
Our approach focuses on placing objects in the same positions as the originals to ensure its effectiveness.
arXiv Detail & Related papers (2024-04-17T10:20:16Z) - A Hybrid Model for Traffic Incident Detection based on Generative
Adversarial Networks and Transformer Model [0.0]
Traffic incident detection plays an indispensable role in intelligent transportation systems.
Previous research has identified that the effectiveness of detection is significantly influenced by challenges related to acquiring large datasets.
A hybrid model combining transformer and generative adversarial networks (GANs) is proposed to address these challenges.
arXiv Detail & Related papers (2024-03-02T09:28:04Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Transformers in Single Object Tracking: An Experimental Survey [1.2526963688768458]
Transformer-based tracking approaches have ushered in a new era in single-object tracking.
We conduct an in-depth literature analysis of Transformer tracking approaches by categorizing them into CNN-Transformer based trackers, Two-stream Two-stage fully-Transformer based trackers, and One-stream One-stage fully-Transformer based trackers.
arXiv Detail & Related papers (2023-02-23T09:12:58Z) - Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth
Estimation in Dynamic Scenes [19.810725397641406]
We propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly.
Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation.
Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior.
arXiv Detail & Related papers (2023-01-14T09:43:23Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Transforming Model Prediction for Tracking [109.08417327309937]
Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models.
We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets.
Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset.
arXiv Detail & Related papers (2022-03-21T17:59:40Z) - Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets.
We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR.
We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.