Action Inference by Maximising Evidence: Zero-Shot Imitation from
Observation with World Models
- URL: http://arxiv.org/abs/2312.02019v1
- Date: Mon, 4 Dec 2023 16:43:36 GMT
- Title: Action Inference by Maximising Evidence: Zero-Shot Imitation from
Observation with World Models
- Authors: Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian
Karl
- Abstract summary: We propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models.
AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO.
While in the second phase, the agent is given some observation-only demonstrations of an expert performing a novel task and tries to imitate the expert's behaviour.
Our method is "zero-shot" in the sense that it does not require further training for the world model or online interactions with the environment after given the demonstration
- Score: 9.583751440005118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unlike most reinforcement learning agents which require an unrealistic amount
of environment interactions to learn a new behaviour, humans excel at learning
quickly by merely observing and imitating others. This ability highly depends
on the fact that humans have a model of their own embodiment that allows them
to infer the most likely actions that led to the observed behaviour. In this
paper, we propose Action Inference by Maximising Evidence (AIME) to replicate
this behaviour using world models. AIME consists of two distinct phases. In the
first phase, the agent learns a world model from its past experience to
understand its own body by maximising the ELBO. While in the second phase, the
agent is given some observation-only demonstrations of an expert performing a
novel task and tries to imitate the expert's behaviour. AIME achieves this by
defining a policy as an inference model and maximising the evidence of the
demonstration under the policy and world model. Our method is "zero-shot" in
the sense that it does not require further training for the world model or
online interactions with the environment after given the demonstration. We
empirically validate the zero-shot imitation performance of our method on the
Walker and Cheetah embodiment of the DeepMind Control Suite and find it
outperforms the state-of-the-art baselines. Code is available at:
https://github.com/argmax-ai/aime.
Related papers
- Dreamitate: Real-World Visuomotor Policy Learning via Video Generation [49.03287909942888]
We propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations of a given task.
We generate an example of an execution of the task conditioned on images of a novel scene, and use this synthesized execution directly to control the robot.
arXiv Detail & Related papers (2024-06-24T17:59:45Z) - A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - Play with Emotion: Affect-Driven Reinforcement Learning [3.611888922173257]
This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning process.
We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior.
arXiv Detail & Related papers (2022-08-26T12:28:24Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Mastering Atari with Discrete World Models [61.7688353335468]
We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.
arXiv Detail & Related papers (2020-10-05T17:52:14Z) - Learning intuitive physics and one-shot imitation using
state-action-prediction self-organizing maps [0.0]
Humans learn by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks.
We suggest a simple but effective unsupervised model which develops such characteristics.
We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.
arXiv Detail & Related papers (2020-07-03T12:29:11Z) - Intrinsic Reward Driven Imitation Learning via Generative Model [48.97800481338626]
Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in a high-dimensional environment.
We propose a novel reward learning module to generate intrinsic reward signals via a generative model.
Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration.
arXiv Detail & Related papers (2020-06-26T15:39:40Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.