Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach
- URL: http://arxiv.org/abs/2407.08065v1
- Date: Wed, 10 Jul 2024 21:55:44 GMT
- Title: Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach
- Authors: Isaac Sheidlower, Reuben Aronson, Elaine Schaertl Short,
- Abstract summary: Foundation models are a promising path toward general-purpose and user-friendly robots.
In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected.
We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
- Score: 1.7205106391379026
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models are a promising path toward general-purpose and user-friendly robots. The prevalent approach involves training a generalist policy that, like a reinforcement learning policy, uses observations to output actions. Although this approach has seen much success, several concerns arise when considering deployment and end-user interaction with these systems. In particular, the lack of modularity between tasks means that when model weights are updated (e.g., when a user provides feedback), the behavior in other, unrelated tasks may be affected. This can negatively impact the system's interpretability and usability. We present an alternative approach to the design of robot foundation models, Diffusion for Policy Parameters (DPP), which generates stand-alone, task-specific policies. Since these policies are detached from the foundation model, they are updated only when a user wants, either through feedback or personalization, allowing them to gain a high degree of familiarity with that policy. We demonstrate a proof-of-concept of DPP in simulation then discuss its limitations and the future of interpretable foundation models.
Related papers
- Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion [41.52811286996212]
Make-An-Agent is a novel policy parameter generator for behavior-to-policy generation.
We show how it can generate a control policy for an agent using just one demonstration of desired behaviors as a prompt.
We also deploy policies generated by Make-An-Agent onto real-world robots on locomotion tasks.
arXiv Detail & Related papers (2024-07-15T17:59:57Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Goal-Conditioned Imitation Learning using Score-based Diffusion Policies [3.49482137286472]
We propose a new policy representation based on score-based diffusion models (SDMs)
We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL)
We show how BESO can even be used to learn a goal-independent policy from play-data usingintuitive-free guidance.
arXiv Detail & Related papers (2023-04-05T15:52:34Z) - To the Noise and Back: Diffusion for Shared Autonomy [2.341116149201203]
We present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models.
Our framework learns a distribution over a space of desired behaviors.
It then employs a diffusion model to translate the user's actions to a sample from this distribution.
arXiv Detail & Related papers (2023-02-23T18:58:36Z) - Policy Adaptation from Foundation Model Feedback [31.5870515250885]
Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots.
By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks.
In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF)
We show PAFF improves baselines by a large margin in all cases.
arXiv Detail & Related papers (2022-12-14T18:31:47Z) - PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive
leaRning [5.046831208137847]
We present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modalities in the pick and place poses.
PARTNR employs an adaptive, sensitivity-based, gating function that decides if additional user demonstrations are required.
We demonstrate the performance of PARTNR in a table-top pick and place task.
arXiv Detail & Related papers (2022-11-15T17:07:40Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.