Related papers: Behavioral Cloning via Search in Video PreTraining Latent Space

Behavioral Cloning via Search in Video PreTraining Latent Space

URL: http://arxiv.org/abs/2212.13326v2
Date: Mon, 17 Apr 2023 05:38:15 GMT
Title: Behavioral Cloning via Search in Video PreTraining Latent Space
Authors: Federico Malato, Florian Leopold, Amogh Raut, Ville Hautam\"aki, Andrew Melnik
Abstract summary: We formulate our control problem as a search problem over a dataset of experts' demonstrations. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge.
Score: 0.13999481573773073
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.

Related papers

Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach [23.52028824411467]
We present a large-scale experimental study involving numepisodes navigation episodes in a real environment with a physical robot. We analyze the type of reasoning emerging from end-to-end training. We show in a post-hoc analysis that the value function learned by the agent relates to long-term planning.
arXiv Detail & Related papers (2025-03-11T11:16:47Z)
Zero-shot Imitation Policy via Search in Demonstration Dataset [0.16817021284806563]
Behavioral cloning uses a dataset of demonstrations to learn a policy. We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2024-01-29T18:38:29Z)
SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL) Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z)
Behavioral Cloning via Search in Embedded Demonstration Dataset [0.15293427903448023]
Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. We use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2023-06-15T12:25:41Z)
Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process. We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert. We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z)
Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)
Time-series Imputation of Temporally-occluded Multiagent Trajectories [18.862173210927658]
We study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting.
arXiv Detail & Related papers (2021-06-08T09:58:43Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.