Unsupervised Domain Adaptation with Dynamics-Aware Rewards in
Reinforcement Learning
- URL: http://arxiv.org/abs/2110.12997v2
- Date: Tue, 26 Oct 2021 01:34:43 GMT
- Title: Unsupervised Domain Adaptation with Dynamics-Aware Rewards in
Reinforcement Learning
- Authors: Jinxin Liu, Hao Shen, Donglin Wang, Yachen Kang, Qiangxing Tian
- Abstract summary: Unconditioned reinforcement learning aims to acquire skills without prior goal representations.
The intuitive approach of training in another interaction-rich environment disrupts the trained skills in the target environment.
We propose an unsupervised domain adaptation method to identify and acquire skills across dynamics.
- Score: 28.808933152885874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised reinforcement learning aims to acquire skills without prior goal
representations, where an agent automatically explores an open-ended
environment to represent goals and learn the goal-conditioned policy. However,
this procedure is often time-consuming, limiting the rollout in some
potentially expensive target environments. The intuitive approach of training
in another interaction-rich environment disrupts the reproducibility of trained
skills in the target environment due to the dynamics shifts and thus inhibits
direct transferring. Assuming free access to a source environment, we propose
an unsupervised domain adaptation method to identify and acquire skills across
dynamics. Particularly, we introduce a KL regularized objective to encourage
emergence of skills, rewarding the agent for both discovering skills and
aligning its behaviors respecting dynamics shifts. This suggests that both
dynamics (source and target) shape the reward to facilitate the learning of
adaptive skills. We also conduct empirical experiments to demonstrate that our
method can effectively learn skills that can be smoothly deployed in target.
Related papers
- SLIM: Skill Learning with Multiple Critics [8.645929825516818]
Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment.
Latent variable models, based on mutual information, have been successful in this task but still struggle in the context of robotic manipulation.
We introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation.
arXiv Detail & Related papers (2024-02-01T18:07:33Z) - Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination.
Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.
Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Safer Autonomous Driving in a Stochastic, Partially-Observable
Environment by Hierarchical Contingency Planning [10.971411555103574]
An intelligent agent should be prepared to anticipate a change in its belief of the environment state.
This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount.
We show that our approach results in robust, safe behaviour in a partially observable, safe environment, generalizing well over environment not seen during training.
arXiv Detail & Related papers (2022-04-13T16:47:00Z) - Weakly Supervised Disentangled Representation for Goal-conditioned
Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization.
In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation.
We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z) - Generating Automatic Curricula via Self-Supervised Active Domain
Randomization [11.389072560141388]
We extend the self-play framework to jointly learn a goal and environment curriculum.
Our method generates a coupled goal-task curriculum, where agents learn through progressively more difficult tasks and environment variations.
Our results show that a curriculum of co-evolving the environment difficulty together with the difficulty of goals set in each environment provides practical benefits in the goal-directed tasks tested.
arXiv Detail & Related papers (2020-02-18T22:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.