VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2202.10324v3
- Date: Fri, 31 Mar 2023 06:41:29 GMT
- Title: VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
- Authors: Che Wang, Xufang Luo, Keith Ross, Dongsheng Li
- Abstract summary: We propose VRL3, a data-driven framework for solving visual deep reinforcement learning (DRL) tasks.
Our framework has three stages: in stage 1, we leverage non-RL datasets to learn task-agnostic visual representations; in stage 2, we use offline RL data; in stage 3, we fine-tune the agent with online RL.
On a set of challenging hand manipulation tasks, VRL3 achieves an average of 780% better sample efficiency.
- Score: 14.869611817084015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose VRL3, a powerful data-driven framework with a simple design for
solving challenging visual deep reinforcement learning (DRL) tasks. We analyze
a number of major obstacles in taking a data-driven approach, and present a
suite of design principles, novel findings, and critical insights about
data-driven visual DRL. Our framework has three stages: in stage 1, we leverage
non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations;
in stage 2, we use offline RL data (e.g. a limited number of expert
demonstrations) to convert the task-agnostic representations into more powerful
task-specific representations; in stage 3, we fine-tune the agent with online
RL. On a set of challenging hand manipulation tasks with sparse reward and
realistic visual inputs, compared to the previous SOTA, VRL3 achieves an
average of 780% better sample efficiency. And on the hardest task, VRL3 is
1220% more sample efficient (2440% when using a wider encoder) and solves the
task with only 10% of the computation. These significant results clearly
demonstrate the great potential of data-driven deep reinforcement learning.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation [0.7564784873669823]
We propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL)
Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms.
We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks.
arXiv Detail & Related papers (2024-01-30T14:09:35Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Offline Visual Representation Learning for Embodied Navigation [50.442660137987275]
offline pretraining of visual representations with self-supervised learning (SSL)
Online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.
arXiv Detail & Related papers (2022-04-27T23:22:43Z) - X-Learner: Learning Cross Sources and Tasks for Universal Visual
Representation [71.51719469058666]
We propose a representation learning framework called X-Learner.
X-Learner learns the universal feature of multiple vision tasks supervised by various sources.
X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.