Using Machine Teaching to Investigate Human Assumptions when Teaching
Reinforcement Learners
- URL: http://arxiv.org/abs/2009.02476v4
- Date: Thu, 29 Jun 2023 04:40:18 GMT
- Title: Using Machine Teaching to Investigate Human Assumptions when Teaching
Reinforcement Learners
- Authors: Yun-Shiuan Chuang, Xuezhou Zhang, Yuzhe Ma, Mark K. Ho, Joseph L.
Austerweil, Xiaojin Zhu
- Abstract summary: We focus on a common reinforcement learning method, Q-learning, and examine what assumptions people have using a behavioral experiment.
We use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states.
Our results reveal how people teach using evaluative feedback and provide guidance for how engineers should design machine agents in a manner that is intuitive for people.
- Score: 26.006964607579004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Successful teaching requires an assumption of how the learner learns - how
the learner uses experiences from the world to update their internal states. We
investigate what expectations people have about a learner when they teach them
in an online manner using rewards and punishment. We focus on a common
reinforcement learning method, Q-learning, and examine what assumptions people
have using a behavioral experiment. To do so, we first establish a normative
standard, by formulating the problem as a machine teaching optimization
problem. To solve the machine teaching optimization problem, we use a deep
learning approximation method which simulates learners in the environment and
learns to predict how feedback affects the learner's internal states. What do
people assume about a learner's learning and discount rates when they teach
them an idealized exploration-exploitation task? In a behavioral experiment, we
find that people can teach the task to Q-learners in a relatively efficient and
effective manner when the learner uses a small value for its discounting rate
and a large value for its learning rate. However, they still are suboptimal. We
also find that providing people with real-time updates of how possible feedback
would affect the Q-learner's internal states weakly helps them teach. Our
results reveal how people teach using evaluative feedback and provide guidance
for how engineers should design machine agents in a manner that is intuitive
for people.
Related papers
- CANDERE-COACH: Reinforcement Learning from Noisy Feedback [12.232688822099325]
The CANDERE-COACH algorithm is capable of learning from noisy feedback by a nonoptimal teacher.
We propose a noise-filtering mechanism to de-noise online feedback data, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect.
arXiv Detail & Related papers (2024-09-23T20:14:12Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Utility-based Adaptive Teaching Strategies using Bayesian Theory of Mind [7.754711372795438]
We build on cognitive science to design teacher agents that tailor their teaching strategies to the learners.
Our ToM-equipped teachers construct models of learners' internal states from observations.
Experiments in simulated environments demonstrate that learners taught this way are more efficient than those taught in a learner-agnostic way.
arXiv Detail & Related papers (2023-09-29T14:27:53Z) - Active Reward Learning from Multiple Teachers [17.10187575303075]
Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.
This human feedback is often a preference comparison, in which the human teacher compares several samples of AI behavior and chooses which they believe best accomplishes the objective.
While reward learning typically assumes that all feedback comes from a single teacher, in practice these systems often query multiple teachers to gather sufficient training data.
arXiv Detail & Related papers (2023-03-02T01:26:53Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Interaction-limited Inverse Reinforcement Learning [50.201765937436654]
We present two different training strategies: Curriculum Inverse Reinforcement Learning (CIRL) covering the teacher's perspective, and Self-Paced Inverse Reinforcement Learning (SPIRL) focusing on the learner's perspective.
Using experiments in simulations and experiments with a real robot learning a task from a human demonstrator, we show that our training strategies can allow a faster training than a random teacher for CIRL and than a batch learner for SPIRL.
arXiv Detail & Related papers (2020-07-01T12:31:52Z) - Understanding the Power and Limitations of Teaching with Imperfect
Knowledge [30.588367257209388]
We study the interaction between a teacher and a student/learner where the teacher selects training examples for the learner to learn a specific task.
Inspired by real-world applications of machine teaching in education, we consider the setting where teacher's knowledge is limited and noisy.
We show connections to how imperfect knowledge affects the teacher's solution of the corresponding machine teaching problem when constructing optimal teaching sets.
arXiv Detail & Related papers (2020-03-21T17:53:26Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.