Related papers: "How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

URL: http://arxiv.org/abs/2602.18372v1
Date: Fri, 20 Feb 2026 17:27:41 GMT
Title: "How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations
Authors: Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson,
Abstract summary: This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework.<n>We analysed 6,113 messages from both learning contexts using 11 different Large Language Models (LLM) and three human raters.<n>Results show that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment.
Score: 39.146761527401424
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for classification of student questions. However, we identify clear limitations in both the ability to classify with schemas and the value of doing so: schemas are limited and thus struggle to accommodate the semantic richness of composite prompts, offering only partial understanding the wider risks and benefits of chatbot integration. In the future, we recommend an analysis approach that captures the nuanced, multi-turn nature of conversation, for example, by applying methods from conversation analysis in discursive psychology.

Related papers

Investigating Student Interaction Patterns with Large Language Model-Powered Course Assistants in Computer Science Courses [4.761218834684297]
Large language models (LLMs) are promising for bridging this gap, but interactions between students and LLMs are rarely overseen by educators.<n>We developed and studied an LLM-powered course assistant deployed across multiple computer science courses to characterize real-world use and understand pedagogical implications.
arXiv Detail & Related papers (2025-09-10T02:21:11Z)
Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z)
"Did my figure do justice to the answer?" : Towards Multimodal Short Answer Grading with Feedback (MMSAF) [41.09752906121257]
We propose the Multimodal Short Answer grading with Feedback (MMSAF) problem along with a dataset of 2,197 data points.<n>As per our evaluations, existing Multimodal Large Language Models (MLLMs) could predict whether an answer is correct, incorrect or partially correct with an accuracy of 55%.<n>Similarly, they could predict whether the image provided in the student's answer is relevant or not with an accuracy of 75%.
arXiv Detail & Related papers (2024-12-27T17:33:39Z)
INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models [15.825663946923289]
Large language models (LLMs) excel at answering questions but remain passive learners-absorbing static data without the ability to question and refine knowledge.<n>This paper explores how LLMs can transition to interactive, question-driven learning through student-teacher dialogues.
arXiv Detail & Related papers (2024-12-16T02:28:53Z)
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students [53.20318273452059]
Large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education.<n>Despite school restrictions, our survey of over 300 middle and high school students revealed that a remarkable 70% of students have utilized LLMs.<n>We propose a few ideas to address such issues, including subject-specific models, personalized learning, and AI classrooms.
arXiv Detail & Related papers (2024-11-27T19:19:34Z)
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and Mentoring [0.0]
We propose an approach to utilize chatbots as mediators of the conversation and sources of limited and controlled generation of explanations. A group chat approach is developed to connect students with human mentors, either on demand or in cases that exceed the chatbots's pre-defined tasks.
arXiv Detail & Related papers (2024-01-16T17:31:35Z)
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions [19.365615476223635]
Conversational question-answering systems aim to create interactive search systems that retrieve information by interacting with users. Existing work uses human annotators to play the roles of the questioner (student) and the answerer (teacher) We propose a simulation framework that employs zero-shot learner LLMs for simulating teacher-student interactions.
arXiv Detail & Related papers (2023-12-05T17:38:02Z)
Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models [51.75805497456226]
This work focuses on the factual consistency issue with the help of the dialogue summarization task. Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency. To stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data.
arXiv Detail & Related papers (2023-11-13T09:32:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.