Adversarial Imitation Learning via Random Search
- URL: http://arxiv.org/abs/2008.09450v1
- Date: Fri, 21 Aug 2020 12:40:03 GMT
- Title: Adversarial Imitation Learning via Random Search
- Authors: MyungJae Shin, Joongheon Kim
- Abstract summary: We propose an imitation learning method that takes advantage of the derivative-free optimization with simple linear policies.
Experiments in this paper show that the proposed model, without a direct reward signal from the environment, obtains competitive performance on the MuJoCo locomotion tasks.
- Score: 15.475463516901938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing agents that can perform challenging complex tasks is the goal of
reinforcement learning. The model-free reinforcement learning has been
considered as a feasible solution. However, the state of the art research has
been to develop increasingly complicated techniques. This increasing complexity
makes the reconstruction difficult. Furthermore, the problem of reward
dependency is still exists. As a result, research on imitation learning, which
learns policy from a demonstration of experts, has begun to attract attention.
Imitation learning directly learns policy based on data on the behavior of the
experts without the explicit reward signal provided by the environment.
However, imitation learning tries to optimize policies based on deep
reinforcement learning such as trust region policy optimization. As a result,
deep reinforcement learning based imitation learning also poses a crisis of
reproducibility. The issue of complex model-free model has received
considerable critical attention. A derivative-free optimization based
reinforcement learning and the simplification on policies obtain competitive
performance on the dynamic complex tasks. The simplified policies and
derivative free methods make algorithm be simple. The reconfiguration of
research demo becomes easy. In this paper, we propose an imitation learning
method that takes advantage of the derivative-free optimization with simple
linear policies. The proposed method performs simple random search in the
parameter space of policies and shows computational efficiency. Experiments in
this paper show that the proposed model, without a direct reward signal from
the environment, obtains competitive performance on the MuJoCo locomotion
tasks.
Related papers
- Amortized nonmyopic active search via deep imitation learning [16.037812098340343]
Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class.
We study the amortization of this policy by training a neural network to learn to search.
Our network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions.
arXiv Detail & Related papers (2024-05-23T20:10:29Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward.
This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.