Learn Dynamic-Aware State Embedding for Transfer Learning
- URL: http://arxiv.org/abs/2101.02230v1
- Date: Wed, 6 Jan 2021 19:07:31 GMT
- Title: Learn Dynamic-Aware State Embedding for Transfer Learning
- Authors: Kaige Yang
- Abstract summary: We consider the setting where all tasks (MDPs) share the same environment dynamic except reward function.
In this setting, the MDP dynamic is a good knowledge to transfer, which can be inferred by uniformly random policy.
We observe that the binary MDP dynamic can be inferred from trajectories of any policy which avoids the need of uniform random policy.
- Score: 0.8756822885568589
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer reinforcement learning aims to improve the sample efficiency of
solving unseen new tasks by leveraging experiences obtained from previous
tasks. We consider the setting where all tasks (MDPs) share the same
environment dynamic except reward function. In this setting, the MDP dynamic is
a good knowledge to transfer, which can be inferred by uniformly random policy.
However, trajectories generated by uniform random policy are not useful for
policy improvement, which impairs the sample efficiency severely. Instead, we
observe that the binary MDP dynamic can be inferred from trajectories of any
policy which avoids the need of uniform random policy. As the binary MDP
dynamic contains the state structure shared over all tasks we believe it is
suitable to transfer. Built on this observation, we introduce a method to infer
the binary MDP dynamic on-line and at the same time utilize it to guide state
embedding learning, which is then transferred to new tasks. We keep state
embedding learning and policy learning separately. As a result, the learned
state embedding is task and policy agnostic which makes it ideal for transfer
learning. In addition, to facilitate the exploration over the state space, we
propose a novel intrinsic reward based on the inferred binary MDP dynamic. Our
method can be used out-of-box in combination with model-free RL algorithms. We
show two instances on the basis of \algo{DQN} and \algo{A2C}. Empirical results
of intensive experiments show the advantage of our proposed method in various
transfer learning tasks.
Related papers
- Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Goal-Conditioned Imitation Learning using Score-based Diffusion Policies [3.49482137286472]
We propose a new policy representation based on score-based diffusion models (SDMs)
We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL)
We show how BESO can even be used to learn a goal-independent policy from play-data usingintuitive-free guidance.
arXiv Detail & Related papers (2023-04-05T15:52:34Z) - Hypernetworks for Zero-shot Transfer in Reinforcement Learning [21.994654567458017]
Hypernetworks are trained to generate behaviors across a range of unseen task conditions.
This work relates to meta RL, contextual RL, and transfer learning.
Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
arXiv Detail & Related papers (2022-11-28T15:48:35Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Fast Adaptation via Policy-Dynamics Value Functions [41.738462615120326]
We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training.
PD-VF explicitly estimates the cumulative reward in a space of policies and environments.
We show that our method can rapidly adapt to new dynamics on a set of MuJoCo domains.
arXiv Detail & Related papers (2020-07-06T16:47:56Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Contextual Policy Transfer in Reinforcement Learning Domains via Deep
Mixtures-of-Experts [24.489002406693128]
We introduce a novel mixture-of-experts formulation for learning state-dependent beliefs over source task dynamics.
We show how this model can be incorporated into standard policy reuse frameworks.
arXiv Detail & Related papers (2020-02-29T07:58:36Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.