Related papers: Imitation Learning by Reinforcement Learning

Imitation Learning by Reinforcement Learning

URL: http://arxiv.org/abs/2108.04763v1
Date: Tue, 10 Aug 2021 16:14:41 GMT
Title: Imitation Learning by Reinforcement Learning
Authors: Kamil Ciosek
Abstract summary: We show that for deterministic experts, imitation learning can be done by reduction to reinforcement learning. We conduct experiments which confirm that our reduction works well in practice for a continuous control task.
Score: 16.62889844853729
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation Learning algorithms learn a policy from demonstrations of expert behavior. Somewhat counterintuitively, we show that, for deterministic experts, imitation learning can be done by reduction to reinforcement learning, which is commonly considered more difficult. We conduct experiments which confirm that our reduction works well in practice for a continuous control task.

Related papers

Latent Action Priors for Locomotion with Deep Reinforcement Learning [42.642008092347986]
Deep Reinforcement Learning (DRL) enables robots to learn complex behaviors through interaction with the environment. We propose an inductive bias for learning locomotion that is especially useful for torque control. We observe that the agent is not restricted to the reward levels of the demonstration, and performance in transfer tasks is improved significantly.
arXiv Detail & Related papers (2024-10-04T09:10:56Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Sample-efficient Adversarial Imitation Learning [45.400080101596956]
We propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations. We show a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs.
arXiv Detail & Related papers (2023-03-14T12:36:01Z)
Causal Imitation Learning with Unobserved Confounders [82.22545916247269]
We study imitation learning when sensory inputs of the learner and the expert differ. We show that imitation could still be feasible by exploiting quantitative knowledge of the expert trajectories.
arXiv Detail & Related papers (2022-08-12T13:29:53Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
The Difficulty of Passive Learning in Deep Reinforcement Learning [26.124032923011328]
Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL) Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. We propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning.
arXiv Detail & Related papers (2021-10-26T20:50:49Z)
Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks. We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z)
Action Advising with Advice Imitation in Deep Reinforcement Learning [0.5185131234265025]
Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm. We present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy.
arXiv Detail & Related papers (2021-04-17T04:24:04Z)
Rehearsal revealed: The limits and merits of revisiting samples in continual learning [43.40531878205344]
We provide insight into the limits and merits of rehearsal, one of continual learning's most established methods. We show that models trained sequentially with rehearsal tend to stay in the same low-loss region after a task has finished, but are at risk of overfitting on its sample memory.
arXiv Detail & Related papers (2021-04-15T13:28:14Z)
Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning. We propose 'Adaptive Insubordination' (ADVISOR) to address this gap. ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.