EUCLID: Towards Efficient Unsupervised Reinforcement Learning with
Multi-choice Dynamics Model
- URL: http://arxiv.org/abs/2210.00498v1
- Date: Sun, 2 Oct 2022 12:11:44 GMT
- Title: EUCLID: Towards Efficient Unsupervised Reinforcement Learning with
Multi-choice Dynamics Model
- Authors: Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Jinyi
Liu, Yingfeng Chen, Changjie Fan
- Abstract summary: Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment.
We introduce a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase.
We show that EUCLID achieves state-of-the-art performance with high sample efficiency.
- Score: 46.99510778097286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised reinforcement learning (URL) poses a promising paradigm to learn
useful behaviors in a task-agnostic environment without the guidance of
extrinsic rewards to facilitate the fast adaptation of various downstream
tasks. Previous works focused on the pre-training in a model-free manner while
lacking the study of transition dynamics modeling that leaves a large space for
the improvement of sample efficiency in downstream tasks. To this end, we
propose an Efficient Unsupervised Reinforcement Learning Framework with
Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused
paradigm to jointly pre-train the dynamics model and unsupervised exploration
policy in the pre-training phase, thus better leveraging the environmental
samples and improving the downstream task sampling efficiency. However,
constructing a generalizable model which captures the local dynamics under
different behaviors remains a challenging problem. We introduce the
multi-choice dynamics model that covers different local dynamics under
different behaviors concurrently, which uses different heads to learn the state
transition under different behaviors during unsupervised pre-training and
selects the most appropriate head for prediction in the downstream task.
Experimental results in the manipulation and locomotion domains demonstrate
that EUCLID achieves state-of-the-art performance with high sample efficiency,
basically solving the state-based URLB benchmark and reaching a mean normalized
score of 104.0$\pm$1.2$\%$ in downstream tasks with 100k fine-tuning steps,
which is equivalent to DDPG's performance at 2M interactive steps with 20x more
data.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes [1.4451387915783602]
Multi-Scenes Network (aka MS-Net) is a multi-path sparse model trained by an evolutionary process.
MS-Net selectively activates a subset of its parameters during the inference stage to produce prediction results for each scene.
Our experiment results show that MS-Net outperforms existing state-of-the-art methods on well-established pedestrian motion prediction datasets.
arXiv Detail & Related papers (2024-03-01T08:32:12Z) - Dynamic-Resolution Model Learning for Object Pile Manipulation [33.05246884209322]
We investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness.
Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs)
We show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles.
arXiv Detail & Related papers (2023-06-29T05:51:44Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Temporal Difference Learning for Model Predictive Control [29.217382374051347]
Data-driven model predictive control has two key advantages over model-free methods.
TD-MPC achieves superior sample efficiency and performance over prior work on both state and image-based continuous control tasks.
arXiv Detail & Related papers (2022-03-09T18:58:28Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.