SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy
Adaptation
- URL: http://arxiv.org/abs/2402.07418v1
- Date: Mon, 12 Feb 2024 05:46:10 GMT
- Title: SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy
Adaptation
- Authors: Sangwoo Shin, Minjong Yoo, Jeongwoo Lee, Honguk Woo
- Abstract summary: This work explores the zero-shot adaptation capability of semantic skills, semantically interpretable experts' behavior patterns, in cross-domain settings.
We present a semantic skill translator framework SemTra which utilizes a set of multi-modal models to extract skills from snippets.
We evaluate our framework with Meta-World, Franka Kitchen, RLBench, and CARLA environments.
- Score: 6.876580618014666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work explores the zero-shot adaptation capability of semantic skills,
semantically interpretable experts' behavior patterns, in cross-domain
settings, where a user input in interleaved multi-modal snippets can prompt a
new long-horizon task for different domains. In these cross-domain settings, we
present a semantic skill translator framework SemTra which utilizes a set of
multi-modal models to extract skills from the snippets, and leverages the
reasoning capabilities of a pretrained language model to adapt these extracted
skills to the target domain. The framework employs a two-level hierarchy for
adaptation: task adaptation and skill adaptation. During task adaptation,
seq-to-seq translation by the language model transforms the extracted skills
into a semantic skill sequence, which is tailored to fit the cross-domain
contexts. Skill adaptation focuses on optimizing each semantic skill for the
target domain context, through parametric instantiations that are facilitated
by language prompting and contrastive learning-based context inferences. This
hierarchical adaptation empowers the framework to not only infer a complex task
specification in one-shot from the interleaved multi-modal snippets, but also
adapt it to new domains with zero-shot learning abilities. We evaluate our
framework with Meta-World, Franka Kitchen, RLBench, and CARLA environments. The
results clarify the framework's superiority in performing long-horizon tasks
and adapting to different domains, showing its broad applicability in practical
use cases, such as cognitive robots interpreting abstract instructions and
autonomous vehicles operating under varied configurations.
Related papers
- Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments [21.7668018144027]
In embodied instruction-following (EIF), pretrained language models (LMs) as task planners emerge as a significant branch.
We present a semantic skill grounding framework that leverages the hierarchical nature of semantic skills.
Our experiments in the VirtualHome benchmark show the efficacy of SemGro in 300 cross-domain EIF scenarios.
arXiv Detail & Related papers (2024-08-02T05:50:31Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting.
It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z) - One-shot Imitation in a Non-Stationary Environment via Multi-Modal Skill [6.294766893350108]
We present a skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation.
We leverage a vision-language model to learn a semantic skill set from offline video datasets.
We evaluate our framework with various one-shot imitation scenarios for extended multi-stage Meta-world tasks.
arXiv Detail & Related papers (2024-02-13T11:01:52Z) - Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context
Learning [48.22913073217633]
Large language models (LLMs) have showcased their capability with few-shot inference known as in-context learning.
In this paper, we study the UDA problem under an in-context learning setting to adapt language models from the source domain to the target domain without any target labels.
We devise different prompting and training strategies, accounting for different LM architectures to learn the target distribution via language modeling.
arXiv Detail & Related papers (2023-11-20T06:06:20Z) - Set-based Meta-Interpolation for Few-Task Meta-Learning [79.4236527774689]
We propose a novel domain-agnostic task augmentation method, Meta-Interpolation, to densify the meta-training task distribution.
We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains.
arXiv Detail & Related papers (2022-05-20T06:53:03Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Self-supervised Augmentation Consistency for Adapting Semantic
Segmentation [56.91850268635183]
We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate.
We employ standard data augmentation techniques $-$ photometric noise, flipping and scaling $-$ and ensure consistency of the semantic predictions.
We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios.
arXiv Detail & Related papers (2021-04-30T21:32:40Z) - Learning to adapt class-specific features across domains for semantic
segmentation [36.36210909649728]
In this thesis, we present a novel architecture, which learns to adapt features across domains by taking into account per class information.
We adopt the recently introduced StarGAN architecture as image translation backbone, since it is able to perform translations across multiple domains by means of a single generator network.
arXiv Detail & Related papers (2020-01-22T23:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.