OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via
Distribution Matching
- URL: http://arxiv.org/abs/2109.04307v1
- Date: Thu, 9 Sep 2021 14:32:26 GMT
- Title: OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via
Distribution Matching
- Authors: Hana Hoshino, Kei Ota, Asako Kanezaki, Rio Yokota
- Abstract summary: Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious.
Prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance.
We present Off-Policy Inverse Reinforcement Learning (OPIRL), which adopts off-policy data distribution instead of on-policy.
- Score: 12.335788185691916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward
engineering can be tedious. However, prior IRL algorithms use on-policy
transitions, which require intensive sampling from the current policy for
stable and optimal performance. This limits IRL applications in the real world,
where environment interactions can become highly expensive. To tackle this
problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which
(1) adopts off-policy data distribution instead of on-policy and enables
significant reduction of the number of interactions with the environment, (2)
learns a stationary reward function that is transferable with high
generalization capabilities on changing dynamics, and (3) leverages
mode-covering behavior for faster convergence. We demonstrate that our method
is considerably more sample efficient and generalizes to novel environments
through the experiments. Our method achieves better or comparable results on
policy performance baselines with significantly fewer interactions.
Furthermore, we empirically show that the recovered reward function generalizes
to different tasks where prior arts are prone to fail.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.