Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning
- URL: http://arxiv.org/abs/2010.13303v1
- Date: Mon, 26 Oct 2020 03:20:42 GMT
- Title: Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning
- Authors: Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kurutach, Jinwoo
Shin, Pieter Abbeel
- Abstract summary: We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
- Score: 137.39196753245105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (RL) has shown great potential in various
control tasks in terms of both sample-efficiency and final performance.
However, learning a generalizable dynamics model robust to changes in dynamics
remains a challenge since the target transition dynamics follow a multi-modal
distribution. In this paper, we present a new model-based RL algorithm, coined
trajectory-wise multiple choice learning, that learns a multi-headed dynamics
model for dynamics generalization. The main idea is updating the most accurate
prediction head to specialize each head in certain environments with similar
dynamics, i.e., clustering environments. Moreover, we incorporate context
learning, which encodes dynamics-specific information from past experiences
into the context latent vector, enabling the model to perform online adaptation
to unseen environments. Finally, to utilize the specialized prediction heads
more effectively, we propose an adaptive planning method, which selects the
most accurate prediction head over a recent experience. Our method exhibits
superior zero-shot generalization performance across a variety of control
tasks, compared to state-of-the-art RL methods. Source code and videos are
available at https://sites.google.com/view/trajectory-mcl.
Related papers
- Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting [15.916325272109454]
We propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts.
A learned routing function, trained concurrently with the deep learning model, dynamically selects the most reliable prediction based on the input scenario.
This work highlights the potential of hybrid approaches for robust and generalizable motion prediction in autonomous driving.
arXiv Detail & Related papers (2024-07-12T17:57:00Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains.
Recent approaches involve pre-training transformer models on vast amounts of unlabeled data.
In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data.
We show that a neural network can model highly nonlinear behaviors accurately for large time horizons.
In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z) - DyNODE: Neural Ordinary Differential Equations for Dynamics Modeling in
Continuous Control [0.0]
We present a novel approach that captures the underlying dynamics of a system by incorporating control in a neural ordinary differential equation framework.
Results indicate that a simple DyNODE architecture when combined with an actor-critic reinforcement learning algorithm outperforms canonical neural networks.
arXiv Detail & Related papers (2020-09-09T12:56:58Z) - Context-aware Dynamics Model for Generalization in Model-Based
Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it.
In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics.
The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.