Primal Wasserstein Imitation Learning
- URL: http://arxiv.org/abs/2006.04678v2
- Date: Wed, 17 Mar 2021 11:43:36 GMT
- Title: Primal Wasserstein Imitation Learning
- Authors: Robert Dadashi, L\'eonard Hussenot, Matthieu Geist, Olivier Pietquin
- Abstract summary: We propose a new Imitation Learning (IL) method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL)
We show that we can recover expert behavior on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of agent interactions and of expert interactions with the environment.
- Score: 44.87651595571687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation Learning (IL) methods seek to match the behavior of an agent with
that of an expert. In the present work, we propose a new IL method based on a
conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL),
which ties to the primal form of the Wasserstein distance between the expert
and the agent state-action distributions. We present a reward function which is
derived offline, as opposed to recent adversarial IL algorithms that learn a
reward function through interactions with the environment, and which requires
little fine-tuning. We show that we can recover expert behavior on a variety of
continuous control tasks of the MuJoCo domain in a sample efficient manner in
terms of agent interactions and of expert interactions with the environment.
Finally, we show that the behavior of the agent we train matches the behavior
of the expert with the Wasserstein distance, rather than the commonly used
proxy of performance.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching [111.78179839856293]
We propose Primal Wasserstein DICE to minimize the primal Wasserstein distance between the learner and expert state occupancies.
Our framework is a generalization of SMODICE, and is the first work that unifies $f$-divergence and Wasserstein minimization.
arXiv Detail & Related papers (2023-11-02T15:41:57Z) - Sample Efficient Imitation Learning via Reward Function Trained in
Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations.
In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z) - Curious Exploration and Return-based Memory Restoration for Deep
Reinforcement Learning [2.3226893628361682]
In this paper, we focus on training a single agent to score goals with binary success/failure reward function.
The proposed method can be utilized to train agents in environments with fairly complex state and action spaces.
arXiv Detail & Related papers (2021-05-02T16:01:34Z) - f-IRL: Inverse Reinforcement Learning via State Marginal Matching [13.100127636586317]
We propose a method for learning the reward function (and the corresponding policy) to match the expert state density.
We present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent.
Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories.
arXiv Detail & Related papers (2020-11-09T19:37:48Z) - Wasserstein Distance guided Adversarial Imitation Learning with Reward
Shape Exploration [21.870750931559915]
We propose a new algorithm named Wasserstein Distance guided Adrial Imitation Learning (WDAIL) for promoting the performance of imitation learning (IL)
The experiment results show that the learning procedure remains remarkably stable, and achieves significant performance in the complex continuous control tasks of MuJoCo.
arXiv Detail & Related papers (2020-06-05T15:10:00Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.