Closed-loop Teaching via Demonstrations to Improve Policy Transparency
- URL: http://arxiv.org/abs/2406.11850v1
- Date: Mon, 1 Apr 2024 14:59:26 GMT
- Title: Closed-loop Teaching via Demonstrations to Improve Policy Transparency
- Authors: Michael S. Lee, Reid Simmons, Henny Admoni,
- Abstract summary: This paper explores augmenting a curriculum with a closed-loop teaching framework inspired by principles from the education literature.
A user study finds that our proposed closed-loop teaching framework reduces the regret in human test responses by 43% over a baseline.
- Score: 2.5515055736875016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Demonstrations are a powerful way of increasing the transparency of AI policies. Though informative demonstrations may be selected a priori through the machine teaching paradigm, student learning may deviate from the preselected curriculum in situ. This paper thus explores augmenting a curriculum with a closed-loop teaching framework inspired by principles from the education literature, such as the zone of proximal development and the testing effect. We utilize tests accordingly to close to the loop and maintain a novel particle filter model of human beliefs throughout the learning process, allowing us to provide demonstrations that are targeted to the human's current understanding in real time. A user study finds that our proposed closed-loop teaching framework reduces the regret in human test responses by 43% over a baseline.
Related papers
- Text-Aware Diffusion for Policy Learning [8.32790576855495]
We propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning.
We show that TADPoLe is able to learn policies for novel goal-achievement and continuous locomotion behaviors specified by natural language, in both Humanoid and Dog environments.
arXiv Detail & Related papers (2024-07-02T03:08:20Z) - Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions [8.869100154323643]
We propose a novel prompt engineering workflow built around a novel object called the "demonstration notebook"
This notebook helps identify the most suitable in-context learning example for a question by gathering and reusing information from the LLM's past interactions.
Our experiments show that this approach outperforms all existing methods for automatic demonstration construction and selection.
arXiv Detail & Related papers (2024-06-16T10:02:20Z) - Imitation Learning from Purified Demonstrations [47.52316615371601]
We propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations.
We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded.
arXiv Detail & Related papers (2023-10-11T02:36:52Z) - Interactively Teaching an Inverse Reinforcement Learner with Limited
Feedback [4.174296652683762]
We study the problem of teaching via demonstrations in sequential decision-making tasks.
In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this problem.
arXiv Detail & Related papers (2023-09-16T21:12:04Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning.
The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories.
Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z) - Learning by Distillation: A Self-Supervised Learning Framework for
Optical Flow Estimation [71.76008290101214]
DistillFlow is a knowledge distillation approach to learning optical flow.
It achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets.
Our models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark.
arXiv Detail & Related papers (2021-06-08T09:13:34Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.