One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill
- URL: http://arxiv.org/abs/2402.08369v1
- Date: Tue, 13 Feb 2024 11:01:52 GMT
- Title: One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill
- Authors: Sangwoo Shin, Daehee Lee, Minjong Yoo, Woo Kyung Kim, Honguk Woo
- Abstract summary: We present a skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation.
We leverage a vision-language model to learn a semantic skill set from offline video datasets.
We evaluate our framework with various one-shot imitation scenarios for extended multi-stage Meta-world tasks.
- Score: 6.294766893350108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One-shot imitation is to learn a new task from a single demonstration, yet it
is a challenging problem to adopt it for complex tasks with the high domain
diversity inherent in a non-stationary environment. To tackle the problem, we
explore the compositionality of complex tasks, and present a novel skill-based
imitation learning framework enabling one-shot imitation and zero-shot
adaptation; from a single demonstration for a complex unseen task, a semantic
skill sequence is inferred and then each skill in the sequence is converted
into an action sequence optimized for environmental hidden dynamics that can
vary over time. Specifically, we leverage a vision-language model to learn a
semantic skill set from offline video datasets, where each skill is represented
on the vision-language embedding space, and adapt meta-learning with dynamics
inference to enable zero-shot skill adaptation. We evaluate our framework with
various one-shot imitation scenarios for extended multi-stage Meta-world tasks,
showing its superiority in learning complex tasks, generalizing to dynamics
changes, and extending to different demonstration conditions and modalities,
compared to other baselines.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy
Adaptation [6.876580618014666]
This work explores the zero-shot adaptation capability of semantic skills, semantically interpretable experts' behavior patterns, in cross-domain settings.
We present a semantic skill translator framework SemTra which utilizes a set of multi-modal models to extract skills from snippets.
We evaluate our framework with Meta-World, Franka Kitchen, RLBench, and CARLA environments.
arXiv Detail & Related papers (2024-02-12T05:46:10Z) - Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks.
With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z) - Inferring Versatile Behavior from Demonstrations by Matching Geometric
Descriptors [72.62423312645953]
Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps.
Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting.
Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility.
arXiv Detail & Related papers (2022-10-17T16:42:59Z) - Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences.
One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations.
We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - SKID RAW: Skill Discovery from Raw Trajectories [23.871402375721285]
It is desirable to only demonstrate full task executions instead of all individual skills.
We propose a novel approach that simultaneously learns to segment trajectories into reoccurring patterns.
The approach learns a skill conditioning that can be used to understand possible sequences of skills.
arXiv Detail & Related papers (2021-03-26T17:27:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.