Behavioral Cloning via Search in Embedded Demonstration Dataset
- URL: http://arxiv.org/abs/2306.09082v1
- Date: Thu, 15 Jun 2023 12:25:41 GMT
- Title: Behavioral Cloning via Search in Embedded Demonstration Dataset
- Authors: Federico Malato, Florian Leopold, Ville Hautamaki, Andrew Melnik
- Abstract summary: Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy.
We use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
- Score: 0.15293427903448023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Behavioural cloning uses a dataset of demonstrations to learn a behavioural
policy. To overcome various learning and policy adaptation problems, we propose
to use latent space to index a demonstration dataset, instantly access similar
relevant experiences, and copy behavior from these situations. Actions from a
selected similar situation can be performed by the agent until representations
of the agent's current situation and the selected experience diverge in the
latent space. Thus, we formulate our control problem as a search problem over a
dataset of experts' demonstrations. We test our approach on BASALT
MineRL-dataset in the latent representation of a Video PreTraining model. We
compare our model to state-of-the-art Minecraft agents. Our approach can
effectively recover meaningful demonstrations and show human-like behavior of
an agent in the Minecraft environment in a wide variety of scenarios.
Experimental results reveal that performance of our search-based approach is
comparable to trained models, while allowing zero-shot task adaptation by
changing the demonstration examples.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Zero-shot Imitation Policy via Search in Demonstration Dataset [0.16817021284806563]
Behavioral cloning uses a dataset of demonstrations to learn a policy.
We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2024-01-29T18:38:29Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - In-Context Demonstration Selection with Cross Entropy Difference [95.21947716378641]
Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks.
We present a cross-entropy difference (CED) method for selecting in-context demonstrations.
arXiv Detail & Related papers (2023-05-24T05:04:00Z) - Behavioral Cloning via Search in Video PreTraining Latent Space [0.13999481573773073]
We formulate our control problem as a search problem over a dataset of experts' demonstrations.
We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model.
The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge.
arXiv Detail & Related papers (2022-12-27T00:20:37Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Robust Imitation of a Few Demonstrations with a Backwards Model [3.8530020696501794]
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way than reinforcement learning.
We tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course.
With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction.
arXiv Detail & Related papers (2022-10-17T18:02:19Z) - Robust Maximum Entropy Behavior Cloning [15.713997170792842]
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task.
Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set?
We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set.
arXiv Detail & Related papers (2021-01-04T22:08:46Z) - Stochastic Action Prediction for Imitation Learning [1.6385815610837169]
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions.
We demonstrate inherentity in demonstrations collected for tasks including line following with a remote-controlled car.
We find that accounting for adversariality in the expert data leads to substantial improvement in the success rate of task completion.
arXiv Detail & Related papers (2020-12-26T08:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.