Zero-shot Imitation Policy via Search in Demonstration Dataset
- URL: http://arxiv.org/abs/2401.16398v1
- Date: Mon, 29 Jan 2024 18:38:29 GMT
- Title: Zero-shot Imitation Policy via Search in Demonstration Dataset
- Authors: Federco Malato, Florian Leopold, Andrew Melnik, Ville Hautamaki
- Abstract summary: Behavioral cloning uses a dataset of demonstrations to learn a policy.
We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
- Score: 0.16817021284806563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Behavioral cloning uses a dataset of demonstrations to learn a policy. To
overcome computationally expensive training procedures and address the policy
adaptation problem, we propose to use latent spaces of pre-trained foundation
models to index a demonstration dataset, instantly access similar relevant
experiences, and copy behavior from these situations. Actions from a selected
similar situation can be performed by the agent until representations of the
agent's current situation and the selected experience diverge in the latent
space. Thus, we formulate our control problem as a dynamic search problem over
a dataset of experts' demonstrations. We test our approach on BASALT
MineRL-dataset in the latent representation of a Video Pre-Training model. We
compare our model to state-of-the-art, Imitation Learning-based Minecraft
agents. Our approach can effectively recover meaningful demonstrations and show
human-like behavior of an agent in the Minecraft environment in a wide variety
of scenarios. Experimental results reveal that performance of our search-based
approach clearly wins in terms of accuracy and perceptual evaluation over
learning-based models.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Behavioral Cloning via Search in Embedded Demonstration Dataset [0.15293427903448023]
Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy.
We use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2023-06-15T12:25:41Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Behavioral Cloning via Search in Video PreTraining Latent Space [0.13999481573773073]
We formulate our control problem as a search problem over a dataset of experts' demonstrations.
We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model.
The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge.
arXiv Detail & Related papers (2022-12-27T00:20:37Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
arXiv Detail & Related papers (2022-10-17T18:02:19Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Stochastic Action Prediction for Imitation Learning [1.6385815610837169]
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions.
We demonstrate inherentity in demonstrations collected for tasks including line following with a remote-controlled car.
We find that accounting for adversariality in the expert data leads to substantial improvement in the success rate of task completion.
arXiv Detail & Related papers (2020-12-26T08:02:33Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.