Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking
- URL: http://arxiv.org/abs/2106.10900v1
- Date: Mon, 21 Jun 2021 07:40:34 GMT
- Title: Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking
- Authors: Xin Li, Wenjie Pei, Zikun Zhou, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang
- Abstract summary: In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data.
Since the object state is known in all synthesized data, existing deep trackers can be trained in routine ways without human annotation.
- Score: 137.26381337333552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep-learning based methods for visual tracking have achieved
substantial progress, these schemes entail large-scale and high-quality
annotated data for sufficient training. To eliminate expensive and exhaustive
annotation, we study self-supervised learning for visual tracking. In this
work, we develop the Crop-Transform-Paste operation, which is able to
synthesize sufficient training data by simulating various kinds of scene
variations during tracking, including appearance variations of objects and
background changes. Since the object state is known in all synthesized data,
existing deep trackers can be trained in routine ways without human annotation.
Different from typical self-supervised learning methods performing visual
representation learning as an individual step, the proposed self-supervised
learning mechanism can be seamlessly integrated into any existing tracking
framework to perform training. Extensive experiments show that our method 1)
achieves favorable performance than supervised learning in few-shot tracking
scenarios; 2) can deal with various tracking challenges such as object
deformation, occlusion, or background clutter due to its design; 3) can be
combined with supervised learning to further boost the performance,
particularly effective in few-shot tracking scenarios.
Related papers
- Exploring the Evolution of Hidden Activations with Live-Update Visualization [12.377279207342735]
We introduce SentryCam, an automated, real-time visualization tool that reveals the progression of hidden representations during training.
Our results show that this visualization offers a more comprehensive view of the learning dynamics compared to basic metrics.
SentryCam could facilitate detailed analysis such as task transfer and catastrophic forgetting to a continual learning setting.
arXiv Detail & Related papers (2024-05-24T01:23:20Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Towards Sequence-Level Training for Visual Tracking [60.95799261482857]
This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning.
Four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training.
arXiv Detail & Related papers (2022-08-11T13:15:36Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Semi-TCL: Semi-Supervised Track Contrastive Representation Learning [40.31083437957288]
We design a new instance-to-track matching objective to learn appearance embedding.
It compares a candidate detection to the embedding of the tracks persisted in the tracker.
We implement this learning objective in a unified form following the spirit of constrastive loss.
arXiv Detail & Related papers (2021-07-06T05:23:30Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.