Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments
- URL: http://arxiv.org/abs/2408.01024v2
- Date: Wed, 21 Aug 2024 01:46:36 GMT
- Title: Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments
- Authors: Sangwoo Shin, Seunghyun Kim, Youngsoo Jang, Moontae Lee, Honguk Woo,
- Abstract summary: In embodied instruction-following (EIF), pretrained language models (LMs) as task planners emerge as a significant branch.
We present a semantic skill grounding framework that leverages the hierarchical nature of semantic skills.
Our experiments in the VirtualHome benchmark show the efficacy of SemGro in 300 cross-domain EIF scenarios.
- Score: 21.7668018144027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In embodied instruction-following (EIF), the integration of pretrained language models (LMs) as task planners emerges as a significant branch, where tasks are planned at the skill level by prompting LMs with pretrained skills and user instructions. However, grounding these pretrained skills in different domains remains challenging due to their intricate entanglement with the domain-specific knowledge. To address this challenge, we present a semantic skill grounding (SemGro) framework that leverages the hierarchical nature of semantic skills. SemGro recognizes the broad spectrum of these skills, ranging from short-horizon low-semantic skills that are universally applicable across domains to long-horizon rich-semantic skills that are highly specialized and tailored for particular domains. The framework employs an iterative skill decomposition approach, starting from the higher levels of semantic skill hierarchy and then moving downwards, so as to ground each planned skill to an executable level within the target domain. To do so, we use the reasoning capabilities of LMs for composing and decomposing semantic skills, as well as their multi-modal extension for assessing the skill feasibility in the target domain. Our experiments in the VirtualHome benchmark show the efficacy of SemGro in 300 cross-domain EIF scenarios.
Related papers
- Language Guided Skill Discovery [56.84356022198222]
We introduce Language Guided Skill Discovery (LGSD) to maximize semantic diversity between skills.
LGSD takes user prompts as input and outputs a set of semantically distinctive skills.
We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt.
arXiv Detail & Related papers (2024-06-07T04:25:38Z) - More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs [40.54076184225558]
The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, known as Catastrophic Forgetting (CF)
This paper presents a challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI)
The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks.
arXiv Detail & Related papers (2024-05-28T05:00:12Z) - Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy.
We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills.
We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z) - Robust Policy Learning via Offline Skill Diffusion [6.876580618014666]
We present a novel offline skill learning framework, DuSkill.
DuSkill employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets.
We show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks.
arXiv Detail & Related papers (2024-03-01T02:00:44Z) - SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy
Adaptation [6.876580618014666]
This work explores the zero-shot adaptation capability of semantic skills, semantically interpretable experts' behavior patterns, in cross-domain settings.
We present a semantic skill translator framework SemTra which utilizes a set of multi-modal models to extract skills from snippets.
We evaluate our framework with Meta-World, Franka Kitchen, RLBench, and CARLA environments.
arXiv Detail & Related papers (2024-02-12T05:46:10Z) - SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution [75.2573501625811]
Diffusion models have demonstrated strong potential for robotic trajectory planning.
generating coherent trajectories from high-level instructions remains challenging.
We propose SkillDiffuser, an end-to-end hierarchical planning framework.
arXiv Detail & Related papers (2023-12-18T18:16:52Z) - Domain Prompt Learning with Quaternion Networks [49.45309818782329]
We propose to leverage domain-specific knowledge from domain-specific foundation models to transfer the robust recognition ability of Vision-Language Models to specialized domains.
We present a hierarchical approach that generates vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features.
Our proposed method achieves new state-of-the-art results in prompt learning.
arXiv Detail & Related papers (2023-12-12T08:49:39Z) - Domain-oriented Language Pre-training with Adaptive Hybrid Masking and
Optimal Transport Alignment [43.874781718934486]
We provide a general domain-oriented approach to adapt pre-trained language models for different application domains.
To preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary training tool.
We introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models.
arXiv Detail & Related papers (2021-12-01T15:47:01Z) - Self-Taught Cross-Domain Few-Shot Learning with Weakly Supervised Object
Localization and Task-Decomposition [84.24343796075316]
We propose a task-expansion-decomposition framework for Cross-Domain Few-Shot Learning.
The proposed Self-Taught (ST) approach alleviates the problem of non-target guidance by constructing task-oriented metric spaces.
We conduct experiments under the cross-domain setting including 8 target domains: CUB, Cars, Places, Plantae, CropDieases, EuroSAT, ISIC, and ChestX.
arXiv Detail & Related papers (2021-09-03T04:23:07Z) - Structured Latent Embeddings for Recognizing Unseen Classes in Unseen
Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains.
Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.