Reflecting in the Reflection: Integrating a Socratic Questioning Framework into Automated AI-Based Question Generation
- URL: http://arxiv.org/abs/2601.14798v1
- Date: Wed, 21 Jan 2026 09:23:11 GMT
- Title: Reflecting in the Reflection: Integrating a Socratic Questioning Framework into Automated AI-Based Question Generation
- Authors: Ondřej Holub, Essi Ryymin, Rodrigo Alves,
- Abstract summary: This paper introduces a reflection-in-reflection framework for automated generation of reflection questions with large language models (LLMs)<n>Our approach coordinates two role-specialized agents, a Student-Teacher and a Teacher-Educator, that engage in a Socratic multi-turn dialogue to iteratively refine a single question.<n>We show that our two-agent protocol produces questions that are judged substantially more relevant and deeper, and better overall, than a one-shot baseline.
- Score: 10.05797775116765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing good reflection questions is pedagogically important but time-consuming and unevenly supported across teachers. This paper introduces a reflection-in-reflection framework for automated generation of reflection questions with large language models (LLMs). Our approach coordinates two role-specialized agents, a Student-Teacher and a Teacher-Educator, that engage in a Socratic multi-turn dialogue to iteratively refine a single question given a teacher-specified topic, key concepts, student level, and optional instructional materials. The Student-Teacher proposes candidate questions with brief rationales, while the Teacher-Educator evaluates them along clarity, depth, relevance, engagement, and conceptual interconnections, responding only with targeted coaching questions or a fixed signal to stop the dialogue. We evaluate the framework in an authentic lower-secondary ICT setting on the topic, using GPT-4o-mini as the backbone model and a stronger GPT- 4-class LLM as an external evaluator in pairwise comparisons of clarity, relevance, depth, and overall quality. First, we study how interaction design and context (dynamic vs.fixed iteration counts; presence or absence of student level and materials) affect question quality. Dynamic stopping combined with contextual information consistently outperforms fixed 5- or 10-step refinement, with very long dialogues prone to drift or over-complication. Second, we show that our two-agent protocol produces questions that are judged substantially more relevant and deeper, and better overall, than a one-shot baseline using the same backbone model.
Related papers
- EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus [59.693733170193944]
We present EduDial, a comprehensive multi-turn teacher-student dialogue dataset.<n>EduDial covers 345 core knowledge points and consists of 34,250 dialogue sessions generated through interactions between teacher and student agents.
arXiv Detail & Related papers (2025-10-14T18:18:43Z) - How Real Is AI Tutoring? Comparing Simulated and Human Dialogues in One-on-One Instruction [6.649393350057383]
This study systematically investigates the structural and behavioral differences between AI-simulated and authentic human tutoring dialogues.<n>Results show that human dialogues are significantly superior to their AI counterparts in utterance length, as well as in questioning (I-Q) and general feedback (F-F) behaviors.
arXiv Detail & Related papers (2025-09-02T03:18:39Z) - Automatic Question & Answer Generation Using Generative Large Language Model (LLM) [0.0]
This research proposes to leverage unsupervised learning methods in NLP, primarily focusing on the English language.<n>A customized model will offer efficient solutions for educators, instructors, and individuals engaged in text-based evaluations.
arXiv Detail & Related papers (2025-08-26T23:36:13Z) - \textsc{SimInstruct}: A Responsible Tool for Collecting Scaffolding Dialogues Between Experts and LLM-Simulated Novices [21.67295740032255]
SimInstruct is a scalable, expert-in-the-loop tool for collecting scaffolding dialogues.<n>Using teaching development coaching as an example domain, SimInstruct simulates novice instructors via LLMs.<n>Our results reveal that persona traits, such as extroversion and introversion, meaningfully influence how experts engage.
arXiv Detail & Related papers (2025-08-06T13:16:10Z) - "Did my figure do justice to the answer?" : Towards Multimodal Short Answer Grading with Feedback (MMSAF) [41.09752906121257]
We propose the Multimodal Short Answer grading with Feedback (MMSAF) problem along with a dataset of 2,197 data points.<n>As per our evaluations, existing Multimodal Large Language Models (MLLMs) could predict whether an answer is correct, incorrect or partially correct with an accuracy of 55%.<n>Similarly, they could predict whether the image provided in the student's answer is relevant or not with an accuracy of 75%.
arXiv Detail & Related papers (2024-12-27T17:33:39Z) - "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays [6.810086342993699]
This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback.
CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation.
A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback.
arXiv Detail & Related papers (2024-09-11T17:59:01Z) - Automated Distractor and Feedback Generation for Math Multiple-choice
Questions via In-context Learning [43.83422798569986]
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and reliable form of assessment.
To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers.
We propose a simple, in-context learning-based solution for automated distractor and corresponding feedback message generation.
arXiv Detail & Related papers (2023-08-07T01:03:04Z) - Covering Uncommon Ground: Gap-Focused Question Generation for Answer
Assessment [75.59538732476346]
We focus on the problem of generating such gap-focused questions (GFQs) automatically.
We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these.
arXiv Detail & Related papers (2023-07-06T22:21:42Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.