Augmented Behavioral Cloning from Observation
- URL: http://arxiv.org/abs/2004.13529v1
- Date: Tue, 28 Apr 2020 13:56:36 GMT
- Title: Augmented Behavioral Cloning from Observation
- Authors: Juarez Monteiro, Nathan Gavenski, Roger Granada, Felipe Meneguzzi and
Rodrigo Barros
- Abstract summary: Imitation from observation is a technique that teaches an agent on how to mimic the behavior of an expert by observing only the sequence of states from the expert demonstrations.
We show empirically that our approach outperforms the state-of-the-art approaches in four different environments by a large margin.
- Score: 14.45796459531414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation from observation is a computational technique that teaches an agent
on how to mimic the behavior of an expert by observing only the sequence of
states from the expert demonstrations. Recent approaches learn the inverse
dynamics of the environment and an imitation policy by interleaving epochs of
both models while changing the demonstration data. However, such approaches
often get stuck into sub-optimal solutions that are distant from the expert,
limiting their imitation effectiveness. We address this problem with a novel
approach that overcomes the problem of reaching bad local minima by exploring:
(I) a self-attention mechanism that better captures global features of the
states; and (ii) a sampling strategy that regulates the observations that are
used for learning. We show empirically that our approach outperforms the
state-of-the-art approaches in four different environments by a large margin.
Related papers
- Offline Imitation Learning with Model-based Reverse Augmentation [48.64791438847236]
We propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation.
Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states.
We use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states.
arXiv Detail & Related papers (2024-06-18T12:27:02Z) - IL-flOw: Imitation Learning from Observation using Normalizing Flows [28.998176144874193]
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only.
Our approach decouples reward modelling from policy learning, unlike state-of-the-art adversarial methods.
arXiv Detail & Related papers (2022-05-19T00:05:03Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Imitation by Predicting Observations [17.86983397979034]
We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks.
Our method, which we call FORM, is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations.
We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
arXiv Detail & Related papers (2021-07-08T14:09:30Z) - Imitating Unknown Policies via Exploration [18.78730427200346]
Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations.
Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions.
We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration.
arXiv Detail & Related papers (2020-08-13T03:03:35Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.