Related papers: Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs

Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs

URL: http://arxiv.org/abs/2409.16490v2
Date: Tue, 10 Dec 2024 21:04:59 GMT
Title: Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs
Authors: Alexander Scarlatos, Ryan S. Baker, Andrew Lan,
Abstract summary: We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
Score: 49.18567856499736
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have led to the development of artificial intelligence (AI)-powered tutoring chatbots, showing promise in providing broad access to high-quality personalized education. Existing works have studied how to make LLMs follow tutoring principles, but have not studied broader uses of LLMs for supporting tutoring. Up until now, tracing student knowledge and analyzing misconceptions has been difficult and time-consuming to implement for open-ended dialogue tutoring. In this work, we investigate whether LLMs can be supportive of this task: we first use LLM prompting methods to identify the knowledge components/skills involved in each dialogue turn, i.e., a tutor utterance posing a task or a student utterance that responds to it. We also evaluate whether the student responds correctly to the tutor and verify the LLM's accuracy using human expert annotations. We then apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue. We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues. We perform extensive qualitative analyses to highlight the challenges in dialogueKT and outline multiple avenues for future work.

Related papers

Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues [48.99818550820575]
Recent studies have shown that strategies used by tutors can have significant effects on student outcomes.<n>Few works have studied predicting tutor strategy in dialogues.<n>We investigate the ability of modern LLMs, particularly Llama 3 and GPT-4o, to predict both future tutor moves and student outcomes in dialogues.
arXiv Detail & Related papers (2025-07-09T14:47:35Z)
Can Large Language Models Help Students Prove Software Correctness? An Experimental Study with Dafny [79.56218230251953]
Students in computing education increasingly use large language models (LLMs) such as ChatGPT.<n>This paper investigates how students interact with an LLM when solving formal verification exercises in Dafny.
arXiv Detail & Related papers (2025-06-27T16:34:13Z)
Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring [0.0]
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning.<n>We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters.<n>The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels.
arXiv Detail & Related papers (2025-05-13T08:50:57Z)
The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course [2.1485350418225244]
textbfStudyChat is a publicly available dataset capturing real-world student interactions with an LLM-powered tutor. We deploy a web application that replicates ChatGPT's core functionalities, and use it to log student interactions with the LLM. We analyze these interactions, highlight behavioral trends, and analyze how specific usage patterns relate to course outcomes.
arXiv Detail & Related papers (2025-03-11T00:17:07Z)
Position: LLMs Can be Good Tutors in Foreign Language Education [87.88557755407815]
We argue that large language models (LLMs) have the potential to serve as effective tutors in foreign language education (FLE) Specifically, LLMs can play three critical roles: (1) as data enhancers, improving the creation of learning materials or serving as student simulations; (2) as task predictors, serving as learner assessment or optimizing learning pathway; and (3) as agents, enabling personalized and inclusive education.
arXiv Detail & Related papers (2025-02-08T06:48:49Z)
Scoring with Large Language Models: A Study on Measuring Empathy of Responses in Dialogues [3.2162648244439684]
We develop a framework for investigating how effective Large Language Models are at measuring and scoring empathy of responses in dialogues. Our strategy is to approximate the performance of state-of-the-art and fine-tuned LLMs with explicit and explainable features. Our results show that when only using embeddings, it is possible to achieve performance close to that of generic LLMs.
arXiv Detail & Related papers (2024-12-28T20:37:57Z)
INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models [15.825663946923289]
Large language models (LLMs) excel at answering questions but remain passive learners--absorbing static data without the ability to question and refine knowledge. This paper explores how LLMs can transition to interactive, question-driven learning through student-teacher dialogues.
arXiv Detail & Related papers (2024-12-16T02:28:53Z)
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students [53.20318273452059]
Large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education. Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs. We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.
arXiv Detail & Related papers (2024-11-27T19:19:34Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
SPL: A Socratic Playground for Learning Powered by Large Language Model [5.383689446227398]
Socratic Playground for Learning (SPL) is a dialogue-based ITS powered by the GPT-4 model. SPL aims to enhance personalized and adaptive learning experiences tailored to individual needs.
arXiv Detail & Related papers (2024-06-20T01:18:52Z)
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z)
Automate Knowledge Concept Tagging on Math Questions with LLMs [48.5585921817745]
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications. Traditionally, these annotations have been conducted manually with help from pedagogical experts. In this paper, we explore the automating the tagging task using Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-26T00:09:38Z)
Crafting a Good Prompt or Providing Exemplary Dialogues? A Study of In-Context Learning for Persona-based Dialogue Generation [15.143135611057309]
We systematically investigate the ICL capabilities of large language models (LLMs) in persona-based dialogue generation. From experimental results, we draw three conclusions: 1) adjusting prompt instructions is the most direct, effective, and economical way to improve generation quality; 2) randomly retrieving demonstrations (demos) achieves the best results; and 3) even when we destroy the multi-turn associations and single-turn semantics in the demos, increasing the number of demos still improves dialogue performance.
arXiv Detail & Related papers (2024-02-15T14:03:33Z)
Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking [5.755004576310333]
Large Language Model (LLM) assistants have emerged as potential alternatives to search methods for helping users navigate software. LLM assistants use vast training data from domain-specific texts, software manuals, and code repositories to mimic human-like interactions.
arXiv Detail & Related papers (2024-02-12T19:49:58Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models [51.75805497456226]
This work focuses on the factual consistency issue with the help of the dialogue summarization task. Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency. To stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data.
arXiv Detail & Related papers (2023-11-13T09:32:12Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs) This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks. Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z)
Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs) As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context. The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.