Related papers: Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

URL: http://arxiv.org/abs/2412.11484v1
Date: Mon, 16 Dec 2024 06:53:00 GMT
Title: Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
Authors: Wonje Choi, Woo Kyung Kim, SeungHyun Kim, Honguk Woo,
Abstract summary: We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning.<n>We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations.<n>In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
Score: 6.402396836189286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For embodied reinforcement learning (RL) agents interacting with the environment, it is desirable to have rapid policy adaptation to unseen visual observations, but achieving zero-shot adaptation capability is considered as a challenging problem in the RL context. To address the problem, we present a novel contrastive prompt ensemble (ConPE) framework which utilizes a pretrained vision-language model and a set of visual prompts, thus enabling efficient policy learning and adaptation upon a wide range of environmental and physical changes encountered by embodied agents. Specifically, we devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations. Each prompt is contrastively learned in terms of an individual domain factor that significantly affects the agent's egocentric perception and observation. For a given task, the attention-based ensemble and policy are jointly learned so that the resulting state representations not only generalize to various domains but are also optimized for learning the task. Through experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks including navigation in AI2THOR, manipulation in egocentric-Metaworld, and autonomous driving in CARLA, while also improving the sample efficiency of policy learning and adaptation.

Related papers

Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities. We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details. We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
Training a Generally Curious Agent [86.84089201249104]
We present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities. Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z)
Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning [12.9372563969007]
Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning. In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information. We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
arXiv Detail & Related papers (2025-02-12T12:00:16Z)
Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation [49.43094200366251]
We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition. Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions. We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies.
arXiv Detail & Related papers (2024-08-29T03:03:35Z)
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling [3.536024441537599]
Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments. We propose an agent influence framework for RL agents to improve the adaptation efficiency of external models in changing environments. Our results show that our method outperforms the baselines in terms of external model adaptation on metrics that measure both efficiency and performance.
arXiv Detail & Related papers (2024-06-28T23:31:22Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts. We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z)
Task-conditioned adaptation of visual features in multi-task policy learning [9.320904829966588]
We introduce task-conditioned adapters that do not require finetuning any pre-trained weights, combined with a single policy trained with behavior cloning. We evaluate the method on a wide variety of tasks from the CortexBench benchmark and show that, compared to existing work, it can be addressed with a single policy.
arXiv Detail & Related papers (2024-02-12T15:57:31Z)
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language Models [28.057588125823266]
In this paper, we empirically analyze how each method behaves with respect to transfer difficulty. We propose an adaptive ensemble method that combines visual prompts and text adapters with pre-trained VLMs.
arXiv Detail & Related papers (2023-11-27T06:37:05Z)
Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy. ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables. We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z)
Conceptual Reinforcement Learning for Language-Conditioned Tasks [20.300727364957208]
We propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy. The key insight is that concepts are compact and invariant representations in human instances and in real-world situations.
arXiv Detail & Related papers (2023-03-09T07:01:06Z)
MaPLe: Multi-modal Prompt Learning [54.96069171726668]
We propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations. Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes.
arXiv Detail & Related papers (2022-10-06T17:59:56Z)
Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization. In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation. We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z)
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification. Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.