The AI Teacher Test: Measuring the Pedagogical Ability of Blender and
GPT-3 in Educational Dialogues
- URL: http://arxiv.org/abs/2205.07540v1
- Date: Mon, 16 May 2022 09:36:30 GMT
- Title: The AI Teacher Test: Measuring the Pedagogical Ability of Blender and
GPT-3 in Educational Dialogues
- Authors: Ana\"is Tack and Chris Piech
- Abstract summary: This paper reports on a first attempt at an AI teacher test.
We built a solution around the insight that you can run conversational agents in parallel to human teachers in real-world dialogues.
Our method builds on the reliability of comparative judgments in education and uses a probabilistic model and Bayesian sampling to infer estimates of pedagogical ability.
- Score: 5.424153769988429
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: How can we test whether state-of-the-art generative models, such as Blender
and GPT-3, are good AI teachers, capable of replying to a student in an
educational dialogue? Designing an AI teacher test is challenging: although
evaluation methods are much-needed, there is no off-the-shelf solution to
measuring pedagogical ability. This paper reports on a first attempt at an AI
teacher test. We built a solution around the insight that you can run
conversational agents in parallel to human teachers in real-world dialogues,
simulate how different agents would respond to a student, and compare these
counterpart responses in terms of three abilities: speak like a teacher,
understand a student, help a student. Our method builds on the reliability of
comparative judgments in education and uses a probabilistic model and Bayesian
sampling to infer estimates of pedagogical ability. We find that, even though
conversational agents (Blender in particular) perform well on conversational
uptake, they are quantifiably worse than real teachers on several pedagogical
dimensions, especially with regard to helpfulness (Blender: {\Delta} ability =
-0.75; GPT-3: {\Delta} ability = -0.93).
Related papers
- Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT.
AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs.
Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z) - Can Language Models Teach Weaker Agents? Teacher Explanations Improve
Students via Personalization [84.86241161706911]
We show that teacher LLMs can indeed intervene on student reasoning to improve their performance.
We also demonstrate that in multi-turn interactions, teacher explanations generalize and learn from explained data.
We verify that misaligned teachers can lower student performance to random chance by intentionally misleading them.
arXiv Detail & Related papers (2023-06-15T17:27:20Z) - MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
Grounded in Math Reasoning Problems [74.73881579517055]
We propose a framework to generate such dialogues by pairing human teachers with a Large Language Model prompted to represent common student errors.
We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues.
arXiv Detail & Related papers (2023-05-23T21:44:56Z) - Opportunities and Challenges in Neural Dialog Tutoring [54.07241332881601]
We rigorously analyze various generative language models on two dialog tutoring datasets for language learning.
We find that although current approaches can model tutoring in constrained learning scenarios, they perform poorly in less constrained scenarios.
Our human quality evaluation shows that both models and ground-truth annotations exhibit low performance in terms of equitable tutoring.
arXiv Detail & Related papers (2023-01-24T11:00:17Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - Improving mathematical questioning in teacher training [1.794107419334178]
High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies.
This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills.
arXiv Detail & Related papers (2021-12-02T05:33:03Z) - Measuring Conversational Uptake: A Case Study on Student-Teacher
Interactions [19.80258498803113]
In education, teachers' uptake of student contributions has been linked to higher student achievement.
We propose a framework for measuring uptake, by releasing a dataset of student-teacher exchanges extracted from US math classroom transcripts annotated for uptake by experts.
We find that although repetition captures a significant part of uptake, pJSD outperforms repetition-based baselines, as it is capable of identifying a wider range of uptake phenomena like question answering and reformulation.
arXiv Detail & Related papers (2021-06-07T18:00:06Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.