HarmonyDream: Task Harmonization Inside World Models
- URL: http://arxiv.org/abs/2310.00344v3
- Date: Wed, 5 Jun 2024 16:21:40 GMT
- Title: HarmonyDream: Task Harmonization Inside World Models
- Authors: Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long,
- Abstract summary: Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
- Score: 93.07314830304193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Motivated by these insights and discoveries, we propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization, i.e. a dynamic equilibrium between the two tasks in world model learning. Our experiments show that the base MBRL method equipped with HarmonyDream gains 10%-69% absolute performance boosts on visual robotic tasks and sets a new state-of-the-art result on the Atari 100K benchmark. Code is available at https://github.com/thuml/HarmonyDream.
Related papers
- Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Learning Latent Dynamic Robust Representations for World Models [9.806852421730165]
Visual Model-Based Reinforcement Learning (MBL) promises to agent's knowledge about the underlying dynamics of the environment.
Top-temporal agents such as Dreamer often struggle with visual pixel-based inputs in the presence of irrelevant noise in the observation space.
We apply a-temporal masking strategy, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models.
arXiv Detail & Related papers (2024-05-10T06:28:42Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning [18.651307543537655]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.
We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - MoDem: Accelerating Visual Model-Based Reinforcement Learning with
Demonstrations [36.44386146801296]
Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications.
We find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL.
We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks.
arXiv Detail & Related papers (2022-12-12T04:28:50Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.