Sample-efficient Reinforcement Learning Representation Learning with
Curiosity Contrastive Forward Dynamics Model
- URL: http://arxiv.org/abs/2103.08255v1
- Date: Mon, 15 Mar 2021 10:08:52 GMT
- Title: Sample-efficient Reinforcement Learning Representation Learning with
Curiosity Contrastive Forward Dynamics Model
- Authors: Thanh Nguyen, Tung M. Luu, Thang Vu and Chang D. Yoo
- Abstract summary: This paper considers a learning framework for Curiosity Contrastive Forward Dynamics Model (CCFDM) in achieving a more sample-efficient reinforcement learning (RL)
CCFDM incorporates a forward dynamics model (FDM) and performs contrastive learning to train its deep convolutional neural network-based image encoder (IE)
During training, CCFDM provides intrinsic rewards, produced based on FDM prediction error, encourages the curiosity of the RL agent to improve exploration.
- Score: 17.41484483119774
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Developing an agent in reinforcement learning (RL) that is capable of
performing complex control tasks directly from high-dimensional observation
such as raw pixels is yet a challenge as efforts are made towards improving
sample efficiency and generalization. This paper considers a learning framework
for Curiosity Contrastive Forward Dynamics Model (CCFDM) in achieving a more
sample-efficient RL based directly on raw pixels. CCFDM incorporates a forward
dynamics model (FDM) and performs contrastive learning to train its deep
convolutional neural network-based image encoder (IE) to extract conducive
spatial and temporal information for achieving a more sample efficiency for RL.
In addition, during training, CCFDM provides intrinsic rewards, produced based
on FDM prediction error, encourages the curiosity of the RL agent to improve
exploration. The diverge and less-repetitive observations provide by both our
exploration strategy and data augmentation available in contrastive learning
improve not only the sample efficiency but also the generalization. Performance
of existing model-free RL methods such as Soft Actor-Critic built on top of
CCFDM outperforms prior state-of-the-art pixel-based RL methods on the DeepMind
Control Suite benchmark.
Related papers
- The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning [8.36595587335589]
Visual Reinforcement Learning methods often require extensive amounts of data.
Model-based RL (MBRL) offers a potential solution with efficient data utilization through planning.
MBRL lacks generalization capabilities for real-world tasks.
arXiv Detail & Related papers (2024-11-15T13:21:26Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Mask-based Latent Reconstruction for Reinforcement Learning [58.43247393611453]
Mask-based Latent Reconstruction (MLR) is proposed to predict the complete state representations in the latent space from the observations with spatially and temporally masked pixels.
Extensive experiments show that our MLR significantly improves the sample efficiency in deep reinforcement learning.
arXiv Detail & Related papers (2022-01-28T13:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.