Bootstrapped Self-Supervised Training with Monocular Video for Semantic
Segmentation and Depth Estimation
- URL: http://arxiv.org/abs/2103.11031v1
- Date: Fri, 19 Mar 2021 21:28:58 GMT
- Title: Bootstrapped Self-Supervised Training with Monocular Video for Semantic
Segmentation and Depth Estimation
- Authors: Yihao Zhang and John J. Leonard
- Abstract summary: We formalize a bootstrapped self-supervised learning problem where a system is initially bootstrapped with supervised training on a labeled dataset.
In this work, we leverage temporal consistency between frames in monocular video to perform this bootstrapped self-supervised training.
In addition, we show that the bootstrapped self-supervised training framework can help a network learn depth estimation better than pure supervised training or self-supervised training.
- Score: 11.468537169201083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For a robot deployed in the world, it is desirable to have the ability of
autonomous learning to improve its initial pre-set knowledge. We formalize this
as a bootstrapped self-supervised learning problem where a system is initially
bootstrapped with supervised training on a labeled dataset and we look for a
self-supervised training method that can subsequently improve the system over
the supervised training baseline using only unlabeled data. In this work, we
leverage temporal consistency between frames in monocular video to perform this
bootstrapped self-supervised training. We show that a well-trained
state-of-the-art semantic segmentation network can be further improved through
our method. In addition, we show that the bootstrapped self-supervised training
framework can help a network learn depth estimation better than pure supervised
training or self-supervised training.
Related papers
- DINO Pre-training for Vision-based End-to-end Autonomous Driving [0.0]
We propose pre-training the visual encoder of a driving agent using the self-distillation with no labels (DINO) method.
Our experiments in CARLA environment in accordance with the Leaderboard benchmark reveal that the proposed pre-training is more efficient than classification-based pre-training.
arXiv Detail & Related papers (2024-07-15T15:18:57Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Analyzing the Sample Complexity of Self-Supervised Image Reconstruction
Methods [24.840134419242414]
Supervised training of deep neural networks on pairs of clean image and noisy measurement achieves state-of-the-art performance for many image reconstruction tasks.
Self-supervised methods enable training based on noisy measurements only, without clean images.
We analytically show that a model trained with such self-supervised training is as good as the same model trained in a supervised fashion.
arXiv Detail & Related papers (2023-05-30T14:42:04Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation
For Action Recognition [8.571437792425417]
We propose a novel transfer learning approach that combines self-distillation in fine-tuning to preserve knowledge from the pre-trained model learned from the large-scale dataset.
Specifically, we fix the encoder from the last epoch as the teacher model to guide the training of the encoder from the current epoch in the transfer learning.
arXiv Detail & Related papers (2022-05-01T16:31:25Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Better Self-training for Image Classification through Self-supervision [3.492636597449942]
Self-supervision is learning without manual supervision by solving an automatically-generated pretext task.
This paper investigates three ways of incorporating self-supervision into self-training to improve accuracy in image classification.
arXiv Detail & Related papers (2021-09-02T08:24:41Z) - Learning Actor-centered Representations for Action Localization in
Streaming Videos using Predictive Learning [18.757368441841123]
Event perception tasks such as recognizing and localizing actions in streaming videos are essential for tackling visual understanding tasks.
We tackle the problem of learning textitactor-centered representations through the notion of continual hierarchical predictive learning.
Inspired by cognitive theories of event perception, we propose a novel, self-supervised framework.
arXiv Detail & Related papers (2021-04-29T06:06:58Z) - How Well Self-Supervised Pre-Training Performs with Streaming Data? [73.5362286533602]
In real-world scenarios where data are collected in a streaming fashion, the joint training scheme is usually storage-heavy and time-consuming.
It is unclear how well sequential self-supervised pre-training performs with streaming data.
We find sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild.
arXiv Detail & Related papers (2021-04-25T06:56:48Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.