Towards Sequence-Level Training for Visual Tracking
- URL: http://arxiv.org/abs/2208.05810v1
- Date: Thu, 11 Aug 2022 13:15:36 GMT
- Title: Towards Sequence-Level Training for Visual Tracking
- Authors: Minji Kim, Seungkwan Lee, Jungseul Ok, Bohyung Han, Minsu Cho
- Abstract summary: This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning.
Four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training.
- Score: 60.95799261482857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the extensive adoption of machine learning on the task of visual
object tracking, recent learning-based approaches have largely overlooked the
fact that visual tracking is a sequence-level task in its nature; they rely
heavily on frame-level training, which inevitably induces inconsistency between
training and testing in terms of both data distributions and task objectives.
This work introduces a sequence-level training strategy for visual tracking
based on reinforcement learning and discusses how a sequence-level design of
data sampling, learning objectives, and data augmentation can improve the
accuracy and robustness of tracking algorithms. Our experiments on standard
benchmarks including LaSOT, TrackingNet, and GOT-10k demonstrate that four
representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP,
consistently improve by incorporating the proposed methods in training without
modifying architectures.
Related papers
- Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - An Effective Incorporating Heterogeneous Knowledge Curriculum Learning
for Sequence Labeling [9.237399190335598]
We propose a two-stage curriculum learning (TCL) framework specifically designed for sequence labeling tasks.
The framework enhances training by gradually introducing data instances from easy to hard, aiming to improve both performance and training speed.
arXiv Detail & Related papers (2024-02-21T05:04:29Z) - Towards Unified Token Learning for Vision-Language Tracking [65.96561538356315]
We present a vision-language (VL) tracking pipeline, termed textbfMMTrack, which casts VL tracking as a token generation task.
Our proposed framework serializes language description and bounding box into a sequence of discrete tokens.
In this new design paradigm, all token queries are required to perceive the desired target and directly predict spatial coordinates of the target.
arXiv Detail & Related papers (2023-08-27T13:17:34Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Self-Supervised Representation Learning from Temporal Ordering of
Automated Driving Sequences [49.91741677556553]
We propose TempO, a temporal ordering pretext task for pre-training region-level feature representations for perception tasks.
We embed each frame by an unordered set of proposal feature vectors, a representation that is natural for object detection or tracking systems.
Extensive evaluations on the BDD100K, nuImages, and MOT17 datasets show that our TempO pre-training approach outperforms single-frame self-supervised learning methods.
arXiv Detail & Related papers (2023-02-17T18:18:27Z) - Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking [137.26381337333552]
In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data.
Since the object state is known in all synthesized data, existing deep trackers can be trained in routine ways without human annotation.
arXiv Detail & Related papers (2021-06-21T07:40:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.