Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
- URL: http://arxiv.org/abs/2311.10678v2
- Date: Thu, 21 Mar 2024 05:47:22 GMT
- Title: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
- Authors: Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas, Andy Zeng, Fei Xia, Dorsa Sadigh,
- Abstract summary: We present Distillation and Retrieval of Online Corrections (DROC)
DROC is a large language model (LLM)-based system that can respond to arbitrary forms of language feedback.
We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base.
- Score: 45.420679219101245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .
Related papers
- RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning [19.023560632891467]
We propose a scalable data generation pipeline that augments expert demonstrations with failure recovery trajectories.
We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework.
Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer on RLbench.
arXiv Detail & Related papers (2024-09-23T02:50:33Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Constrained Equation Learner Networks for Precision-Preserving
Extrapolation of Robotic Skills [6.144680854063937]
This paper presents a novel supervised learning framework that addresses the trajectory adaptation problem in Programming by Demonstrations.
We exploit Equation Learner Networks to learn a set of analytical expressions and use them as basis functions.
Our approach addresses three main difficulties in adapting robotic trajectories: 1) minimizing the distortion of the trajectory for new adaptations; 2) preserving the precision of the adaptations; and 3) dealing with the lack of intuition about the structure of basis functions.
arXiv Detail & Related papers (2023-11-04T18:16:18Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - Reinforced Self-Training (ReST) for Language Modeling [56.75447441157628]
Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences.
We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST)
Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.
arXiv Detail & Related papers (2023-08-17T14:12:48Z) - Towards Unbounded Machine Unlearning [13.31957848633701]
We study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for forgetting' and associated metrics for forget quality.
For UP, we propose a novel adaptation of a strong Membership Inference Attack for unlearning.
We also propose SCRUB, a novel unlearning algorithm, which is consistently a top performer for forget quality across the different application-dependent metrics for RB, RC, and UP.
arXiv Detail & Related papers (2023-02-20T10:15:36Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Pre-Training for Query Rewriting in A Spoken Language Understanding
System [14.902583546933563]
We first propose a neural-retrieval based approach for query rewriting.
Then, inspired by the wide success of pre-trained contextual language embeddings, we propose a language-modeling (LM) based approach.
arXiv Detail & Related papers (2020-02-13T16:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.