Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
- URL: http://arxiv.org/abs/2412.11484v1
- Date: Mon, 16 Dec 2024 06:53:00 GMT
- Title: Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
- Authors: Wonje Choi, Woo Kyung Kim, SeungHyun Kim, Honguk Woo,
- Abstract summary: We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning.
We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations.
In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
- Score: 6.402396836189286
- License:
- Abstract: For embodied reinforcement learning (RL) agents interacting with the environment, it is desirable to have rapid policy adaptation to unseen visual observations, but achieving zero-shot adaptation capability is considered as a challenging problem in the RL context. To address the problem, we present a novel contrastive prompt ensemble (ConPE) framework which utilizes a pretrained vision-language model and a set of visual prompts, thus enabling efficient policy learning and adaptation upon a wide range of environmental and physical changes encountered by embodied agents. Specifically, we devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations. Each prompt is contrastively learned in terms of an individual domain factor that significantly affects the agent's egocentric perception and observation. For a given task, the attention-based ensemble and policy are jointly learned so that the resulting state representations not only generalize to various domains but are also optimized for learning the task. Through experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks including navigation in AI2THOR, manipulation in egocentric-Metaworld, and autonomous driving in CARLA, while also improving the sample efficiency of policy learning and adaptation.
Related papers
- Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning [0.0]
Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning.
In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information.
We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
arXiv Detail & Related papers (2025-02-12T12:00:16Z) - Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation [49.43094200366251]
We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition.
Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions.
We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies.
arXiv Detail & Related papers (2024-08-29T03:03:35Z) - External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling [3.536024441537599]
Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments.
We propose an agent influence framework for RL agents to improve the adaptation efficiency of external models in changing environments.
Our results show that our method outperforms the baselines in terms of external model adaptation on metrics that measure both efficiency and performance.
arXiv Detail & Related papers (2024-06-28T23:31:22Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language Models [28.057588125823266]
In this paper, we empirically analyze how each method behaves with respect to transfer difficulty.
We propose an adaptive ensemble method that combines visual prompts and text adapters with pre-trained VLMs.
arXiv Detail & Related papers (2023-11-27T06:37:05Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Conceptual Reinforcement Learning for Language-Conditioned Tasks [20.300727364957208]
We propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy.
The key insight is that concepts are compact and invariant representations in human instances and in real-world situations.
arXiv Detail & Related papers (2023-03-09T07:01:06Z) - Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
Translation [75.86581380817464]
A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write.
This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF)
Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances.
arXiv Detail & Related papers (2022-03-22T23:33:18Z) - Weakly Supervised Disentangled Representation for Goal-conditioned
Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization.
In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation.
We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z) - Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification.
Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.