Unsupervised Feature Learning for Manipulation with Contrastive Domain
Randomization
- URL: http://arxiv.org/abs/2103.11144v1
- Date: Sat, 20 Mar 2021 09:54:45 GMT
- Title: Unsupervised Feature Learning for Manipulation with Contrastive Domain
Randomization
- Authors: Carmel Rabinovitz, Niko Grupen and Aviv Tamar
- Abstract summary: We show that a naive application of domain randomization to unsupervised learning does not promote invariance.
We propose a simple modification of the contrastive loss to fix this, exploiting the fact that we can control the simulated randomization of visual properties.
- Score: 19.474628552656764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robotic tasks such as manipulation with visual inputs require image features
that capture the physical properties of the scene, e.g., the position and
configuration of objects. Recently, it has been suggested to learn such
features in an unsupervised manner from simulated, self-supervised, robot
interaction; the idea being that high-level physical properties are well
captured by modern physical simulators, and their representation from visual
inputs may transfer well to the real world. In particular, learning methods
based on noise contrastive estimation have shown promising results. To
robustify the simulation-to-real transfer, domain randomization (DR) was
suggested for learning features that are invariant to irrelevant visual
properties such as textures or lighting. In this work, however, we show that a
naive application of DR to unsupervised learning based on contrastive
estimation does not promote invariance, as the loss function maximizes mutual
information between the features and both the relevant and irrelevant visual
properties. We propose a simple modification of the contrastive loss to fix
this, exploiting the fact that we can control the simulated randomization of
visual properties. Our approach learns physical features that are significantly
more robust to visual domain variation, as we demonstrate using both rigid and
non-rigid objects.
Related papers
- Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning [28.673952870674146]
We develop a measurement-pretrain-finetune paradigm for Unsupervised Feature Transformation Learning.
For unsupervised feature set utility measurement, we propose a feature value consistency preservation perspective.
For generative transformation finetuning, we regard a feature set as a feature cross sequence and feature transformation as sequential generation.
arXiv Detail & Related papers (2024-05-27T06:50:00Z) - Learning Action-based Representations Using Invariance [18.1941237781348]
We introduce action-bisimulation encoding, which learns a multi-step controllability metric that discounts distant state features that are relevant for control.
We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments.
arXiv Detail & Related papers (2024-03-25T02:17:54Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation [4.7246285569677315]
We present Sim-to-Real Dense Object Nets (SRDONs), a dense object descriptor that not only understands the object via appropriate representation but also maps simulated and real data to a unified feature space with pixel consistency.
We demonstrate in experiments that pre-trained SRDONs significantly improve performances on unseen objects and unseen visual environments for various robotic tasks with zero real-world training.
arXiv Detail & Related papers (2023-04-18T02:28:55Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - RISP: Rendering-Invariant State Predictor with Differentiable Simulation
and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z) - Sim2Real Object-Centric Keypoint Detection and Description [40.58367357980036]
Keypoint detection and description play a central role in computer vision.
We propose the object-centric formulation, which requires further identifying which object each interest point belongs to.
We develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications.
arXiv Detail & Related papers (2022-02-01T15:00:20Z) - Improving Transferability of Representations via Augmentation-Aware
Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples.
Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability.
AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.