PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive
leaRning
- URL: http://arxiv.org/abs/2211.08304v1
- Date: Tue, 15 Nov 2022 17:07:40 GMT
- Title: PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive
leaRning
- Authors: Jelle Luijkx, Zlatan Ajanovic, Laura Ferranti, Jens Kober
- Abstract summary: We present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modalities in the pick and place poses.
PARTNR employs an adaptive, sensitivity-based, gating function that decides if additional user demonstrations are required.
We demonstrate the performance of PARTNR in a table-top pick and place task.
- Score: 5.046831208137847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent works show impressive results in mapping language-based human
commands and image scene observations to direct robot executable policies
(e.g., pick and place poses). However, these approaches do not consider the
uncertainty of the trained policy and simply always execute actions suggested
by the current policy as the most probable ones. This makes them vulnerable to
domain shift and inefficient in the number of required demonstrations. We
extend previous works and present the PARTNR algorithm that can detect
ambiguities in the trained policy by analyzing multiple modalities in the pick
and place poses using topological analysis. PARTNR employs an adaptive,
sensitivity-based, gating function that decides if additional user
demonstrations are required. User demonstrations are aggregated to the dataset
and used for subsequent training. In this way, the policy can adapt promptly to
domain shift and it can minimize the number of required demonstrations for a
well-trained policy. The adaptive threshold enables to achieve the
user-acceptable level of ambiguity to execute the policy autonomously and in
turn, increase the trustworthiness of our system. We demonstrate the
performance of PARTNR in a table-top pick and place task.
Related papers
- Inference-Time Policy Steering through Human Interactions [54.02655062969934]
During inference, humans are often removed from the policy execution loop.
We propose an Inference-Time Policy Steering framework that leverages human interactions to bias the generative sampling process.
Our proposed sampling strategy achieves the best trade-off between alignment and distribution shift.
arXiv Detail & Related papers (2024-11-25T18:03:50Z) - Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.
We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach [1.7205106391379026]
Foundation models are a promising path toward general-purpose and user-friendly robots.
In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected.
We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
arXiv Detail & Related papers (2024-07-10T21:55:44Z) - Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for
Test-Time Policy Adaptation [20.266695694005943]
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments.
Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation.
We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts.
arXiv Detail & Related papers (2023-07-12T17:55:08Z) - "Guess what I'm doing": Extending legibility to sequential decision
tasks [7.352593846694083]
We investigate the notion of legibility in sequential decision tasks under uncertainty.
Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable.
arXiv Detail & Related papers (2022-09-19T16:01:33Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.