An Unbiased Look at Datasets for Visuo-Motor Pre-Training
- URL: http://arxiv.org/abs/2310.09289v1
- Date: Fri, 13 Oct 2023 17:59:02 GMT
- Title: An Unbiased Look at Datasets for Visuo-Motor Pre-Training
- Authors: Sudeep Dasari, Mohan Kumar Srirama, Unnat Jain, Abhinav Gupta
- Abstract summary: We show that dataset choice is just as important to this paradigm's success.
We observe that traditional vision datasets are surprisingly competitive options for visuo-motor representation learning.
We show that common simulation benchmarks are not a reliable proxy for real world performance.
- Score: 20.094244564603184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual representation learning hold great promise for robotics, but is
severely hampered by the scarcity and homogeneity of robotics datasets. Recent
works address this problem by pre-training visual representations on
large-scale but out-of-domain data (e.g., videos of egocentric interactions)
and then transferring them to target robotics tasks. While the field is heavily
focused on developing better pre-training algorithms, we find that dataset
choice is just as important to this paradigm's success. After all, the
representation can only learn the structures or priors present in the
pre-training dataset. To this end, we flip the focus on algorithms, and instead
conduct a dataset centric analysis of robotic pre-training. Our findings call
into question some common wisdom in the field. We observe that traditional
vision datasets (like ImageNet, Kinetics and 100 Days of Hands) are
surprisingly competitive options for visuo-motor representation learning, and
that the pre-training dataset's image distribution matters more than its size.
Finally, we show that common simulation benchmarks are not a reliable proxy for
real world performance and that simple regularization strategies can
dramatically improve real world policy learning.
https://data4robotics.github.io
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - Supervised and Contrastive Self-Supervised In-Domain Representation
Learning for Dense Prediction Problems in Remote Sensing [0.0]
This paper explores the effectiveness of in-domain representations in both supervised and self-supervised forms to solve the domain difference between remote sensing and the ImageNet dataset.
For self-supervised pre-training, we have utilized the SimSiam algorithm as it is simple and does not need huge computational resources.
Our results have demonstrated that using datasets with a high spatial resolution for self-supervised representation learning leads to high performance in downstream tasks.
arXiv Detail & Related papers (2023-01-29T20:56:51Z) - Palm up: Playing in the Latent Manifold for Unsupervised Pretraining [31.92145741769497]
We propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space.
We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data.
arXiv Detail & Related papers (2022-10-19T22:26:12Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.