MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
Grounded in Math Reasoning Problems
- URL: http://arxiv.org/abs/2305.14536v2
- Date: Mon, 23 Oct 2023 12:00:01 GMT
- Title: MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
Grounded in Math Reasoning Problems
- Authors: Jakub Macina, Nico Daheim, Sankalan Pal Chowdhury, Tanmay Sinha, Manu
Kapur, Iryna Gurevych, Mrinmaya Sachan
- Abstract summary: We propose a framework to generate such dialogues by pairing human teachers with a Large Language Model prompted to represent common student errors.
We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues.
- Score: 74.73881579517055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While automatic dialogue tutors hold great potential in making education
personalized and more accessible, research on such systems has been hampered by
a lack of sufficiently large and high-quality datasets. Collecting such
datasets remains challenging, as recording tutoring sessions raises privacy
concerns and crowdsourcing leads to insufficient data quality. To address this,
we propose a framework to generate such dialogues by pairing human teachers
with a Large Language Model (LLM) prompted to represent common student errors.
We describe how we use this framework to collect MathDial, a dataset of 3k
one-to-one teacher-student tutoring dialogues grounded in multi-step math
reasoning problems. While models like GPT-3 are good problem solvers, they fail
at tutoring because they generate factually incorrect feedback or are prone to
revealing solutions to students too early. To overcome this, we let teachers
provide learning opportunities to students by guiding them using various
scaffolding questions according to a taxonomy of teacher moves. We demonstrate
MathDial and its extensive annotations can be used to finetune models to be
more effective tutors (and not just solvers). We confirm this by automatic and
human evaluation, notably in an interactive setting that measures the trade-off
between student solving success and telling solutions. The dataset is released
publicly.
Related papers
- Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all.
LLMs struggle to precisely detect student's errors and tailor their feedback to these errors.
Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z) - Covering Uncommon Ground: Gap-Focused Question Generation for Answer
Assessment [75.59538732476346]
We focus on the problem of generating such gap-focused questions (GFQs) automatically.
We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these.
arXiv Detail & Related papers (2023-07-06T22:21:42Z) - Can Language Models Teach Weaker Agents? Teacher Explanations Improve
Students via Personalization [84.86241161706911]
We show that teacher LLMs can indeed intervene on student reasoning to improve their performance.
We also demonstrate that in multi-turn interactions, teacher explanations generalize and learn from explained data.
We verify that misaligned teachers can lower student performance to random chance by intentionally misleading them.
arXiv Detail & Related papers (2023-06-15T17:27:20Z) - Opportunities and Challenges in Neural Dialog Tutoring [54.07241332881601]
We rigorously analyze various generative language models on two dialog tutoring datasets for language learning.
We find that although current approaches can model tutoring in constrained learning scenarios, they perform poorly in less constrained scenarios.
Our human quality evaluation shows that both models and ground-truth annotations exhibit low performance in terms of equitable tutoring.
arXiv Detail & Related papers (2023-01-24T11:00:17Z) - Computationally Identifying Funneling and Focusing Questions in
Classroom Discourse [24.279653100481863]
We propose the task of computationally detecting funneling and focusing questions in classroom discourse.
We release an annotated dataset of 2,348 teacher utterances labeled for funneling and focusing questions, or neither.
Our best model, a supervised RoBERTa model fine-tuned on our dataset, has a strong linear correlation of.76 with human expert labels and with positive educational outcomes.
arXiv Detail & Related papers (2022-07-08T01:28:29Z) - The AI Teacher Test: Measuring the Pedagogical Ability of Blender and
GPT-3 in Educational Dialogues [5.424153769988429]
This paper reports on a first attempt at an AI teacher test.
We built a solution around the insight that you can run conversational agents in parallel to human teachers in real-world dialogues.
Our method builds on the reliability of comparative judgments in education and uses a probabilistic model and Bayesian sampling to infer estimates of pedagogical ability.
arXiv Detail & Related papers (2022-05-16T09:36:30Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.