Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation
- URL: http://arxiv.org/abs/2411.09891v1
- Date: Fri, 15 Nov 2024 02:35:20 GMT
- Title: Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation
- Authors: Yihong Guo, Yixuan Wang, Yuanyuan Shi, Pan Xu, Anqi Liu,
- Abstract summary: We propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain.
Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation.
- Score: 19.37193250533054
- License:
- Abstract: Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal policies, which does not guarantee optimal performance when the learned policy is actually deployed to the target domain. In this work, we propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation and follows the general framework of generative adversarial imitation learning from observation (GAIfO) by applying a reward augmented estimator for the policy optimization step. Theoretically, we present an error bound for our method under a mild assumption regarding the dynamics shift to justify the motivation of our method. Empirically, our method outperforms the pure modified reward method without imitation learning and also outperforms other baselines in benchmark off-dynamics environments.
Related papers
- Cross-Domain Policy Adaptation by Capturing Representation Mismatch [53.087413751430255]
It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL)
In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain.
We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain.
arXiv Detail & Related papers (2024-05-24T09:06:12Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - DIRECT: Learning from Sparse and Shifting Rewards using Discriminative
Reward Co-Training [13.866486498822228]
We propose discriminative reward co-training as an extension to deep reinforcement learning algorithms.
A discriminator network is trained concurrently to the policy to distinguish between trajectories generated by the current policy and beneficial trajectories generated by previous policies.
Our results show that DIRECT outperforms state-of-the-art algorithms in sparse- and shifting-reward environments.
arXiv Detail & Related papers (2023-01-18T10:42:00Z) - A Deep Reinforcement Learning Approach to Marginalized Importance
Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution.
We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.
We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z) - MetaAlign: Coordinating Domain Alignment and Classification for
Unsupervised Domain Adaptation [84.90801699807426]
This paper proposes an effective meta-optimization based strategy dubbed MetaAlign.
We treat the domain alignment objective and the classification objective as the meta-train and meta-test tasks in a meta-learning scheme.
Experimental results demonstrate the effectiveness of our proposed method on top of various alignment-based baseline approaches.
arXiv Detail & Related papers (2021-03-25T03:16:05Z) - Gradient Regularized Contrastive Learning for Continual Domain
Adaptation [86.02012896014095]
We study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains.
We propose Gradient Regularized Contrastive Learning (GRCL) to solve the obstacles.
Experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach.
arXiv Detail & Related papers (2021-03-23T04:10:42Z) - Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning.
We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z) - Adversarial Weighting for Domain Adaptation in Regression [4.34858896385326]
We present a novel instance-based approach to handle regression tasks in the context of supervised domain adaptation.
We develop an adversarial network algorithm which learns both the source weighting scheme and the task in one feed-forward gradient descent.
arXiv Detail & Related papers (2020-06-15T09:44:04Z) - Invariant Policy Optimization: Towards Stronger Generalization in
Reinforcement Learning [5.476958867922322]
A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training.
We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that implements this principle and learns an invariant policy during training.
arXiv Detail & Related papers (2020-06-01T17:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.