Visual Reinforcement Learning with Self-Supervised 3D Representations
- URL: http://arxiv.org/abs/2210.07241v1
- Date: Thu, 13 Oct 2022 17:59:55 GMT
- Title: Visual Reinforcement Learning with Self-Supervised 3D Representations
- Authors: Yanjie Ze, Nicklas Hansen, Yinbo Chen, Mohit Jain, Xiaolong Wang
- Abstract summary: We present a unified framework for self-supervised learning of 3D representations for motor control.
Our method enjoys improved sample efficiency in simulated manipulation tasks compared to 2D representation learning methods.
- Score: 15.991546692872841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A prominent approach to visual Reinforcement Learning (RL) is to learn an
internal state representation using self-supervised methods, which has the
potential benefit of improved sample-efficiency and generalization through
additional learning signal and inductive biases. However, while the real world
is inherently 3D, prior efforts have largely been focused on leveraging 2D
computer vision techniques as auxiliary self-supervision. In this work, we
present a unified framework for self-supervised learning of 3D representations
for motor control. Our proposed framework consists of two phases: a pretraining
phase where a deep voxel-based 3D autoencoder is pretrained on a large
object-centric dataset, and a finetuning phase where the representation is
jointly finetuned together with RL on in-domain data. We empirically show that
our method enjoys improved sample efficiency in simulated manipulation tasks
compared to 2D representation learning methods. Additionally, our learned
policies transfer zero-shot to a real robot setup with only approximate
geometric correspondence, and successfully solve motor control tasks that
involve grasping and lifting from a single, uncalibrated RGB camera. Code and
videos are available at https://yanjieze.com/3d4rl/ .
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Enhancing 2D Representation Learning with a 3D Prior [21.523007105586217]
Learning robust and effective representations of visual data is a fundamental task in computer vision.
Traditionally, this is achieved by training models with labeled data which can be expensive to obtain.
We propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural.
arXiv Detail & Related papers (2024-06-04T17:55:22Z) - Part-Guided 3D RL for Sim2Real Articulated Object Manipulation [27.422878372169805]
We propose a part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations.
We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training.
A single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation.
arXiv Detail & Related papers (2024-04-26T10:18:17Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - Unsupervised Learning of Efficient Geometry-Aware Neural Articulated
Representations [89.1388369229542]
We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects.
We obviate this need by learning the representations with GAN training.
Experiments demonstrate the efficiency of our method and show that GAN-based training enables learning of controllable 3D representations without supervision.
arXiv Detail & Related papers (2022-04-19T12:10:18Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.