Related papers: IntrEx: A Dataset for Modeling Engagement in Educational Conversations

IntrEx: A Dataset for Modeling Engagement in Educational Conversations

URL: http://arxiv.org/abs/2509.06652v2
Date: Wed, 17 Sep 2025 12:55:31 GMT
Title: IntrEx: A Dataset for Modeling Engagement in Educational Conversations
Authors: Xingwei Tan, Mahathi Parvatham, Chiara Gambi, Gabriele Pergola,
Abstract summary: IntrEx is the first large dataset annotated for interestingness and expected interestingness in teacher-student interactions.<n>We employ a rigorous annotation process with over 100 second-language learners.<n>We investigate whether large language models (LLMs) can predict human interestingness judgments.
Score: 7.526860155587907
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Engagement and motivation are crucial for second-language acquisition, yet maintaining learner interest in educational conversations remains a challenge. While prior research has explored what makes educational texts interesting, still little is known about the linguistic features that drive engagement in conversations. To address this gap, we introduce IntrEx, the first large dataset annotated for interestingness and expected interestingness in teacher-student interactions. Built upon the Teacher-Student Chatroom Corpus (TSCC), IntrEx extends prior work by incorporating sequence-level annotations, allowing for the study of engagement beyond isolated turns to capture how interest evolves over extended dialogues. We employ a rigorous annotation process with over 100 second-language learners, using a comparison-based rating approach inspired by reinforcement learning from human feedback (RLHF) to improve agreement. We investigate whether large language models (LLMs) can predict human interestingness judgments. We find that LLMs (7B/8B parameters) fine-tuned on interestingness ratings outperform larger proprietary models like GPT-4o, demonstrating the potential for specialised datasets to model engagement in educational settings. Finally, we analyze how linguistic and cognitive factors, such as concreteness, comprehensibility (readability), and uptake, influence engagement in educational dialogues.

Related papers

From Words to Wisdom: Discourse Annotation and Baseline Models for Student Dialogue Understanding [5.459797813771498]
This work introduces an annotated educational dialogue dataset of student conversations featuring knowledge construction and task production discourse.<n>We also establish baseline models for automatically predicting these discourse properties for each turn of talk within conversations, using pre-trained large language models GPT-3.5 and Llama-3.1.<n> Experimental results indicate that these state-of-the-art models perform suboptimally on this task, indicating the potential for future research.
arXiv Detail & Related papers (2025-11-25T17:46:00Z)
One-Topic-Doesn't-Fit-All: Transcreating Reading Comprehension Test for Personalized Learning [39.357397697061664]
We propose a novel approach to generating personalized English reading comprehension tests tailored to students' interests.<n>We generate new passages and multiple-choice reading comprehension questions that are linguistically similar to the original passages but semantically aligned with individual learners' interests.<n>Our results show students learning with personalized reading passages demonstrate improved comprehension and motivation retention compared to those learning with non-personalized materials.
arXiv Detail & Related papers (2025-11-12T09:17:25Z)
Once Upon a Time: Interactive Learning for Storytelling with Small Language Models [1.8012666291588018]
We investigate whether language models can be trained with less data by learning from high-level, cognitively inspired feedback.<n>We train a student model to generate stories, which a teacher model rates on readability, narrative coherence, and creativity.<n>We find that the high-level feedback is highly data efficient: With just 1 M words of input in interactive learning, storytelling skills can improve as much as with 410 M words of next-word prediction.
arXiv Detail & Related papers (2025-09-19T07:45:34Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z)
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence. We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena. As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z)
Are Human Conversations Special? A Large Language Model Perspective [8.623471682333964]
This study analyzes changes in the attention mechanisms of large language models (LLMs) when used to understand natural conversations between humans (human-human) Our findings reveal that while language models exhibit domain-specific attention behaviors, there is a significant gap in their ability to specialize in human conversations.
arXiv Detail & Related papers (2024-03-08T04:44:25Z)
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation [22.38338205905379]
We leverage reinforcement learning algorithms to overcome the above challenges by introducing a novel reward function. Our reward function combines an accuracy metric and a faithfulness metric to provide a balanced quality judgment of generated responses.
arXiv Detail & Related papers (2023-11-02T02:42:41Z)
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z)
Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations [2.0653090022137697]
We explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Our experiments confirm the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction.
arXiv Detail & Related papers (2023-10-06T10:22:51Z)
Spoken Language Intelligence of Large Language Models for Language Learning [3.1964044595140217]
We focus on evaluating the efficacy of large language models (LLMs) in the realm of education.<n>We introduce a new multiple-choice question dataset to evaluate the effectiveness of LLMs in the aforementioned scenarios.<n>We also investigate the influence of various prompting techniques such as zero- and few-shot method.<n>We find that models of different sizes have good understanding of concepts in phonetics, phonology, and second language acquisition, but show limitations in reasoning for real-world problems.
arXiv Detail & Related papers (2023-08-28T12:47:41Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
Learning to Memorize Entailment and Discourse Relations for Persona-Consistent Dialogues [8.652711997920463]
Existing works have improved the performance of dialogue systems by intentionally learning interlocutor personas with sophisticated network structures. This study proposes a method of learning to memorize entailment and discourse relations for persona-consistent dialogue tasks.
arXiv Detail & Related papers (2023-01-12T08:37:00Z)
Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features. To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives. Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.