Generating Language Corrections for Teaching Physical Control Tasks
- URL: http://arxiv.org/abs/2306.07012v1
- Date: Mon, 12 Jun 2023 10:31:16 GMT
- Title: Generating Language Corrections for Teaching Physical Control Tasks
- Authors: Megha Srivastava, Noah Goodman, Dorsa Sadigh
- Abstract summary: CORGI is a model trained to generate language corrections for physical control tasks.
We show that CORGI can (i) generate valid feedback for novel student trajectories, (ii) outperform baselines on domains with novel control dynamics, and (iii) improve student learning in an interactive drawing task.
- Score: 21.186109830294072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI assistance continues to help advance applications in education, from
language learning to intelligent tutoring systems, yet current methods for
providing students feedback are still quite limited. Most automatic feedback
systems either provide binary correctness feedback, which may not help a
student understand how to improve, or require hand-coding feedback templates,
which may not generalize to new domains. This can be particularly challenging
for physical control tasks, where the rich diversity in student behavior and
specialized domains make it challenging to leverage general-purpose assistive
tools for providing feedback. We design and build CORGI, a model trained to
generate language corrections for physical control tasks, such as learning to
ride a bike. CORGI takes in as input a pair of student and expert trajectories,
and then generates natural language corrections to help the student improve. We
collect and train CORGI over data from three diverse physical control tasks
(drawing, steering, and joint movement). Through both automatic and human
evaluations, we show that CORGI can (i) generate valid feedback for novel
student trajectories, (ii) outperform baselines on domains with novel control
dynamics, and (iii) improve student learning in an interactive drawing task.
Related papers
- WIP: A Unit Testing Framework for Self-Guided Personalized Online Robotics Learning [3.613641107321095]
This paper focuses on creating a system for unit testing while integrating it into the course workflow.
In line with the framework's personalized student-centered approach, this method makes it easier for students to revise, and debug their programming work.
The course workflow updated to include unit tests will strengthen the learning environment and make it more interactive so that students can learn how to program robots in a self-guided fashion.
arXiv Detail & Related papers (2024-05-18T00:56:46Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections [45.420679219101245]
We present Distillation and Retrieval of Online Corrections (DROC)
DROC is a large language model (LLM)-based system that can respond to arbitrary forms of language feedback.
We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base.
arXiv Detail & Related papers (2023-11-17T18:00:20Z) - Empowering Private Tutoring by Chaining Large Language Models [87.76985829144834]
This work explores the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs)
The system is into three inter-connected core processes-interaction, reflection, and reaction.
Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules.
arXiv Detail & Related papers (2023-09-15T02:42:03Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics [24.863665993509997]
A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide.
As an alternative to state action demonstrations, the teacher can provide corrective feedback such as their preferences or rewards.
We show that our approach can learn quickly from a variety of noisy feedback.
arXiv Detail & Related papers (2021-04-02T12:42:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.