Policy Adaptation from Foundation Model Feedback
- URL: http://arxiv.org/abs/2212.07398v4
- Date: Tue, 21 Mar 2023 16:16:41 GMT
- Title: Policy Adaptation from Foundation Model Feedback
- Authors: Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang
- Abstract summary: Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots.
By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks.
In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF)
We show PAFF improves baselines by a large margin in all cases.
- Score: 31.5870515250885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress on vision-language foundation models have brought significant
advancement to building general-purpose robots. By using the pre-trained models
to encode the scene and instructions as inputs for decision making, the
instruction-conditioned policy can generalize across different objects and
tasks. While this is encouraging, the policy still fails in most cases given an
unseen task or environment. In this work, we propose Policy Adaptation from
Foundation model Feedback (PAFF). When deploying the trained policy to a new
task or a new environment, we first let the policy play with randomly generated
instructions to record the demonstrations. While the execution could be wrong,
we can use the pre-trained foundation models to provide feedback to relabel the
demonstrations. This automatically provides new pairs of
demonstration-instruction data for policy fine-tuning. We evaluate our method
on a broad range of experiments with the focus on generalization on unseen
objects, unseen tasks, unseen environments, and sim-to-real transfer. We show
PAFF improves baselines by a large margin in all cases. Our project page is
available at https://geyuying.github.io/PAFF/
Related papers
- Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames [15.800100875117312]
Affordances are central to robotic manipulation, where most tasks can be simplified to interactions with task-specific regions on objects.
We propose an affordance-centric policy-learning approach that centres and appropriately textitorients a textittask frame on these affordance regions.
We demonstrate that our approach can learn manipulation tasks using behaviour cloning from as little as 10 demonstrations, with equivalent generalisation to an image-based policy trained on 305 demonstrations.
arXiv Detail & Related papers (2024-10-15T23:57:35Z) - FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach [1.7205106391379026]
Foundation models are a promising path toward general-purpose and user-friendly robots.
In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected.
We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
arXiv Detail & Related papers (2024-07-10T21:55:44Z) - Learning Generalizable Manipulation Policies with Object-Centric 3D
Representations [65.55352131167213]
GROOT is an imitation learning method for learning robust policies with object-centric and 3D priors.
It builds policies that generalize beyond their initial training conditions for vision-based manipulation.
GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances.
arXiv Detail & Related papers (2023-10-22T18:51:45Z) - Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Goal-Conditioned Imitation Learning using Score-based Diffusion Policies [3.49482137286472]
We propose a new policy representation based on score-based diffusion models (SDMs)
We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL)
We show how BESO can even be used to learn a goal-independent policy from play-data usingintuitive-free guidance.
arXiv Detail & Related papers (2023-04-05T15:52:34Z) - PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive
leaRning [5.046831208137847]
We present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modalities in the pick and place poses.
PARTNR employs an adaptive, sensitivity-based, gating function that decides if additional user demonstrations are required.
We demonstrate the performance of PARTNR in a table-top pick and place task.
arXiv Detail & Related papers (2022-11-15T17:07:40Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Self-Supervised Policy Adaptation during Deployment [98.25486842109936]
Self-supervision allows the policy to continue training after deployment without using any rewards.
Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom.
Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.
arXiv Detail & Related papers (2020-07-08T17:56:27Z) - Learning visual policies for building 3D shape categories [130.7718618259183]
Previous work in this domain often assembles particular instances of objects from known sets of primitives.
We learn a visual policy to assemble other instances of the same category.
Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
arXiv Detail & Related papers (2020-04-15T17:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.