Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics
- URL: http://arxiv.org/abs/2104.01021v1
- Date: Fri, 2 Apr 2021 12:42:12 GMT
- Title: Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics
- Authors: Matthew Schmittle, Sanjiban Choudhury, Siddhartha S. Srinivasa
- Abstract summary: A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide.
As an alternative to state action demonstrations, the teacher can provide corrective feedback such as their preferences or rewards.
We show that our approach can learn quickly from a variety of noisy feedback.
- Score: 24.863665993509997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key challenge in Imitation Learning (IL) is that optimal state actions
demonstrations are difficult for the teacher to provide. For example in
robotics, providing kinesthetic demonstrations on a robotic manipulator
requires the teacher to control multiple degrees of freedom at once. The
difficulty of requiring optimal state action demonstrations limits the space of
problems where the teacher can provide quality feedback. As an alternative to
state action demonstrations, the teacher can provide corrective feedback such
as their preferences or rewards. Prior work has created algorithms designed to
learn from specific types of noisy feedback, but across teachers and tasks
different forms of feedback may be required. Instead we propose that in order
to learn from a diversity of scenarios we need to learn from a variety of
feedback. To learn from a variety of feedback we make the following insight:
the teacher's cost function is latent and we can model a stream of feedback as
a stream of loss functions. We then use any online learning algorithm to
minimize the sum of these losses. With this insight we can learn from a
diversity of feedback that is weakly correlated with the teacher's true cost
function. We unify prior work into a general corrective feedback meta-algorithm
and show that regardless of feedback we can obtain the same regret bounds. We
demonstrate our approach by learning to perform a household navigation task on
a robotic racecar platform. Our results show that our approach can learn
quickly from a variety of noisy feedback.
Related papers
- CANDERE-COACH: Reinforcement Learning from Noisy Feedback [12.232688822099325]
The CANDERE-COACH algorithm is capable of learning from noisy feedback by a nonoptimal teacher.
We propose a noise-filtering mechanism to de-noise online feedback data, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect.
arXiv Detail & Related papers (2024-09-23T20:14:12Z) - ExpertAF: Expert Actionable Feedback from Video [81.46431188306397]
We introduce a novel method to generate actionable feedback from video of a person doing a physical activity.
Our method takes a video demonstration and its accompanying 3D body pose and generates expert commentary.
Our method is able to reason across multi-modal input combinations to output full-spectrum, actionable coaching.
arXiv Detail & Related papers (2024-08-01T16:13:07Z) - Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models [2.1485350418225244]
Large language model (LLM)-based methods have shown great promise in feedback generation for programming assignments.
This paper explores using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair.
We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers.
arXiv Detail & Related papers (2024-05-01T03:52:39Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Active teacher selection for reinforcement learning from human feedback [14.009227941725783]
Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback.
We propose the Hidden Utility Bandit framework to model differences in teacher rationality, expertise, and costliness.
We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing.
arXiv Detail & Related papers (2023-10-23T18:54:43Z) - Active Reward Learning from Multiple Teachers [17.10187575303075]
Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.
This human feedback is often a preference comparison, in which the human teacher compares several samples of AI behavior and chooses which they believe best accomplishes the objective.
While reward learning typically assumes that all feedback comes from a single teacher, in practice these systems often query multiple teachers to gather sufficient training data.
arXiv Detail & Related papers (2023-03-02T01:26:53Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.