Action Inference by Maximising Evidence: Zero-Shot Imitation from
Observation with World Models
- URL: http://arxiv.org/abs/2312.02019v1
- Date: Mon, 4 Dec 2023 16:43:36 GMT
- Title: Action Inference by Maximising Evidence: Zero-Shot Imitation from
Observation with World Models
- Authors: Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian
Karl
- Abstract summary: We propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models.
AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO.
While in the second phase, the agent is given some observation-only demonstrations of an expert performing a novel task and tries to imitate the expert's behaviour.
Our method is "zero-shot" in the sense that it does not require further training for the world model or online interactions with the environment after given the demonstration
- Score: 9.583751440005118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unlike most reinforcement learning agents which require an unrealistic amount
of environment interactions to learn a new behaviour, humans excel at learning
quickly by merely observing and imitating others. This ability highly depends
on the fact that humans have a model of their own embodiment that allows them
to infer the most likely actions that led to the observed behaviour. In this
paper, we propose Action Inference by Maximising Evidence (AIME) to replicate
this behaviour using world models. AIME consists of two distinct phases. In the
first phase, the agent learns a world model from its past experience to
understand its own body by maximising the ELBO. While in the second phase, the
agent is given some observation-only demonstrations of an expert performing a
novel task and tries to imitate the expert's behaviour. AIME achieves this by
defining a policy as an inference model and maximising the evidence of the
demonstration under the policy and world model. Our method is "zero-shot" in
the sense that it does not require further training for the world model or
online interactions with the environment after given the demonstration. We
empirically validate the zero-shot imitation performance of our method on the
Walker and Cheetah embodiment of the DeepMind Control Suite and find it
outperforms the state-of-the-art baselines. Code is available at:
https://github.com/argmax-ai/aime.
Related papers
- Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds [107.62381002403814]
This paper addresses the task of learning an agent model behaving like humans, which can jointly perceive, predict, and act in egocentric worlds.
We propose a joint predictive agent model, named EgoAgent, that simultaneously learns to represent the world, predict future states, and take reasonable actions with a single transformer.
arXiv Detail & Related papers (2025-02-09T11:28:57Z) - Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination [25.62602420895531]
A world model provides an agent with a representation of its environment, enabling it to predict the causal consequences of its actions.
We introduce a new paradigm for constructing world models that are explicit representations of the real world and its dynamics.
DreMa replicates the observed world and its dynamics, allowing it to imagine novel configurations of objects and predict the future consequences of robot actions.
arXiv Detail & Related papers (2024-12-19T15:38:15Z) - Play with Emotion: Affect-Driven Reinforcement Learning [3.611888922173257]
This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning process.
We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior.
arXiv Detail & Related papers (2022-08-26T12:28:24Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Mastering Atari with Discrete World Models [61.7688353335468]
We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model.
DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.
arXiv Detail & Related papers (2020-10-05T17:52:14Z) - Learning intuitive physics and one-shot imitation using
state-action-prediction self-organizing maps [0.0]
Humans learn by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks.
We suggest a simple but effective unsupervised model which develops such characteristics.
We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.
arXiv Detail & Related papers (2020-07-03T12:29:11Z) - Intrinsic Reward Driven Imitation Learning via Generative Model [48.97800481338626]
Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in a high-dimensional environment.
We propose a novel reward learning module to generate intrinsic reward signals via a generative model.
Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration.
arXiv Detail & Related papers (2020-06-26T15:39:40Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.