Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain
- URL: http://arxiv.org/abs/2110.11443v1
- Date: Thu, 21 Oct 2021 19:23:15 GMT
- Title: Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain
- Authors: Yachen Kang, Jinxin Liu, Xin Cao and Donglin Wang
- Abstract summary: We propose an approach for inverse reinforcement learning from hetero-domain which learns a reward function in the simulator, drawing on the demonstrations from the real world.
The intuition behind the method is that the reward function should not only be oriented to imitate the experts, but should encourage actions adjusted for the dynamics difference between the simulator and the real world.
- Score: 11.075036222901417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an approach for inverse reinforcement learning from hetero-domain
which learns a reward function in the simulator, drawing on the demonstrations
from the real world. The intuition behind the method is that the reward
function should not only be oriented to imitate the experts, but should
encourage actions adjusted for the dynamics difference between the simulator
and the real world. To achieve this, the widely used GAN-inspired IRL method is
adopted, and its discriminator, recognizing policy-generating trajectories, is
modified with the quantification of dynamics difference. The training process
of the discriminator can yield the transferable reward function suitable for
simulator dynamics, which can be guaranteed by derivation. Effectively, our
method assigns higher rewards for demonstration trajectories which do not
exploit discrepancies between the two domains. With extensive experiments on
continuous control tasks, our method shows its effectiveness and demonstrates
its scalability to high-dimensional tasks.
Related papers
- Learning Causally Invariant Reward Functions from Diverse Demonstrations [6.351909403078771]
Inverse reinforcement learning methods aim to retrieve the reward function of a Markov decision process based on a dataset of expert demonstrations.
This adaptation often exhibits overfitting to the expert data set when a policy is trained on the obtained reward function under distribution shift of the environment dynamics.
In this work, we explore a novel regularization approach for inverse reinforcement learning methods based on the causal invariance principle with the goal of improved reward function generalization.
arXiv Detail & Related papers (2024-09-12T12:56:24Z) - Conditional Neural Expert Processes for Learning Movement Primitives from Demonstration [1.9336815376402723]
Conditional Neural Expert Processes (CNEP) learns to assign demonstrations from different modes to distinct expert networks.
CNEP does not require supervision on which mode the trajectories belong to.
Our system is capable of on-the-fly adaptation to environmental changes via an online conditioning mechanism.
arXiv Detail & Related papers (2024-02-13T12:52:02Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement
Learning with Sub-optimal Demonstrations [25.536792010283566]
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations.
We introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework.
Our framework demonstrates significant performance improvements over previous SOTA methods.
arXiv Detail & Related papers (2023-10-13T02:38:35Z) - Learning Representative Trajectories of Dynamical Systems via
Domain-Adaptive Imitation [0.0]
We propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation.
Our experiments show that DATI outperforms baseline methods for imitation learning and optimal control in this setting.
Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic.
arXiv Detail & Related papers (2023-04-19T15:53:48Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning.
We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.