Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method
and Contrastive Learning
- URL: http://arxiv.org/abs/2303.03787v2
- Date: Sun, 10 Sep 2023 11:41:42 GMT
- Title: Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method
and Contrastive Learning
- Authors: Mostafa Kotb, Cornelius Weber, Stefan Wermter
- Abstract summary: We propose Curiosity CEM, an improved version of the Cross-Entropy Method (CEM) algorithm for encouraging exploration via curiosity.
Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward.
Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms.
- Score: 21.995159117991278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) with real-time planning has shown
great potential in locomotion and manipulation control tasks. However, the
existing planning methods, such as the Cross-Entropy Method (CEM), do not scale
well to complex high-dimensional environments. One of the key reasons for
underperformance is the lack of exploration, as these planning methods only aim
to maximize the cumulative extrinsic reward over the planning horizon.
Furthermore, planning inside the compact latent space in the absence of
observations makes it challenging to use curiosity-based intrinsic motivation.
We propose Curiosity CEM (CCEM), an improved version of the CEM algorithm for
encouraging exploration via curiosity. Our proposed method maximizes the sum of
state-action Q values over the planning horizon, in which these Q values
estimate the future extrinsic and intrinsic reward, hence encouraging reaching
novel observations. In addition, our model uses contrastive representation
learning to efficiently learn latent representations. Experiments on
image-based continuous control tasks from the DeepMind Control suite show that
CCEM is by a large margin more sample-efficient than previous MBRL algorithms
and compares favorably with the best model-free RL methods.
Related papers
- Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Diffused Task-Agnostic Milestone Planner [13.042155799536657]
We propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space.
The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control.
arXiv Detail & Related papers (2023-12-06T10:09:22Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time.
We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Dream and Search to Control: Latent Space Planning for Continuous
Control [24.991127785736364]
We show that it is possible to demonstrate the types of bootstrapping benefits as previously shown for discrete spaces.
In particular, the approach achieves improved sample efficiency and performance on a majority of challenging continuous-control benchmarks.
arXiv Detail & Related papers (2020-10-19T20:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.