Related papers: Zero-shot Imitation Policy via Search in Demonstration Dataset

Zero-shot Imitation Policy via Search in Demonstration Dataset

URL: http://arxiv.org/abs/2401.16398v1
Date: Mon, 29 Jan 2024 18:38:29 GMT
Title: Zero-shot Imitation Policy via Search in Demonstration Dataset
Authors: Federco Malato, Florian Leopold, Andrew Melnik, Ville Hautamaki
Abstract summary: Behavioral cloning uses a dataset of demonstrations to learn a policy. We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
Score: 0.16817021284806563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a dynamic search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video Pre-Training model. We compare our model to state-of-the-art, Imitation Learning-based Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach clearly wins in terms of accuracy and perceptual evaluation over learning-based models.

Related papers

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Behavioral Cloning via Search in Embedded Demonstration Dataset [0.15293427903448023]
Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. We use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2023-06-15T12:25:41Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Behavioral Cloning via Search in Video PreTraining Latent Space [0.13999481573773073]
We formulate our control problem as a search problem over a dataset of experts' demonstrations. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge.
arXiv Detail & Related papers (2022-12-27T00:20:37Z)
Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z)
Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning. We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course. With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
arXiv Detail & Related papers (2022-10-17T18:02:19Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Stochastic Action Prediction for Imitation Learning [1.6385815610837169]
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. We demonstrate inherentity in demonstrations collected for tasks including line following with a remote-controlled car. We find that accounting for adversariality in the expert data leads to substantial improvement in the success rate of task completion.
arXiv Detail & Related papers (2020-12-26T08:02:33Z)
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential. We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first. In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.