Related papers: Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning

Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning

URL: http://arxiv.org/abs/2303.03787v2
Date: Sun, 10 Sep 2023 11:41:42 GMT
Title: Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning
Authors: Mostafa Kotb, Cornelius Weber, Stefan Wermter
Abstract summary: We propose Curiosity CEM, an improved version of the Cross-Entropy Method (CEM) algorithm for encouraging exploration via curiosity. Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward. Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms.
Score: 21.995159117991278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model-based reinforcement learning (MBRL) with real-time planning has shown great potential in locomotion and manipulation control tasks. However, the existing planning methods, such as the Cross-Entropy Method (CEM), do not scale well to complex high-dimensional environments. One of the key reasons for underperformance is the lack of exploration, as these planning methods only aim to maximize the cumulative extrinsic reward over the planning horizon. Furthermore, planning inside the compact latent space in the absence of observations makes it challenging to use curiosity-based intrinsic motivation. We propose Curiosity CEM (CCEM), an improved version of the CEM algorithm for encouraging exploration via curiosity. Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward, hence encouraging reaching novel observations. In addition, our model uses contrastive representation learning to efficiently learn latent representations. Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms and compares favorably with the best model-free RL methods.

Related papers

Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction [7.918703013303246]
We present Latent Macro Action Planner (L-MAP), which addresses the challenge of learning to make decisions in high-dimensional continuous action spaces. L-MAP learns a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE) In offline RL settings, including continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns.
arXiv Detail & Related papers (2025-02-28T16:02:23Z)
Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z)
Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly. We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction. Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z)
Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks. We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements. We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z)
Diffused Task-Agnostic Milestone Planner [13.042155799536657]
We propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control.
arXiv Detail & Related papers (2023-12-06T10:09:22Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions. We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z)
Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time. We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
Dream and Search to Control: Latent Space Planning for Continuous Control [24.991127785736364]
We show that it is possible to demonstrate the types of bootstrapping benefits as previously shown for discrete spaces. In particular, the approach achieves improved sample efficiency and performance on a majority of challenging continuous-control benchmarks.
arXiv Detail & Related papers (2020-10-19T20:10:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.