EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
- URL: http://arxiv.org/abs/2509.24120v1
- Date: Sun, 28 Sep 2025 23:33:58 GMT
- Title: EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
- Authors: Sourjyadip Ray, Shubham Sharma, Somak Aditya, Pawan Goyal,
- Abstract summary: This paper explores using Multimodal Large Language Models (MLLMs) to automatically respond to student questions from online lectures.<n>We introduce the EduVidQA dataset with 5252 question-answer pairs (both synthetic and real-world) from 296 computer science videos covering diverse topics and difficulty levels.
- Score: 8.393656792613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As digital platforms redefine educational paradigms, ensuring interactivity remains vital for effective learning. This paper explores using Multimodal Large Language Models (MLLMs) to automatically respond to student questions from online lectures - a novel question answering task of real world significance. We introduce the EduVidQA Dataset with 5252 question-answer pairs (both synthetic and real-world) from 296 computer science videos covering diverse topics and difficulty levels. To understand the needs of the dataset and task evaluation, we empirically study the qualitative preferences of students, which we provide as an important contribution to this line of work. Our benchmarking experiments consist of 6 state-of-the-art MLLMs, through which we study the effectiveness of our synthetic data for finetuning, as well as showing the challenging nature of the task. We evaluate the models using both text-based and qualitative metrics, thus showing a nuanced perspective of the models' performance, which is paramount to future work. This work not only sets a benchmark for this important problem, but also opens exciting avenues for future research in the field of Natural Language Processing for Education.
Related papers
- Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos [6.689443785478135]
We investigate the capabilities of vision-language models for generating learning-oriented questions for educational videos.<n>Our findings delineate the capabilities of current vision-language models, highlighting the need for fine-tuning and addressing challenges in question diversity and relevance.
arXiv Detail & Related papers (2025-05-03T11:37:31Z) - What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations [47.79536652721794]
This paper introduces VISTA, a dataset specifically designed for video-to-text summarization in scientific domains.<n>We benchmark the performance of state-of-the-art large models and apply a plan-based framework to better capture the structured nature of abstracts.
arXiv Detail & Related papers (2025-02-12T10:36:55Z) - Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications.<n>The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard.<n>We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z) - INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models [15.825663946923289]
Large language models (LLMs) excel at answering questions but remain passive learners-absorbing static data without the ability to question and refine knowledge.<n>This paper explores how LLMs can transition to interactive, question-driven learning through student-teacher dialogues.
arXiv Detail & Related papers (2024-12-16T02:28:53Z) - AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? [65.92331309449015]
We introduce AutoBench-V, an automated framework for serving evaluation on demand, i.e., benchmarking LVLMs based on specific aspects of model capability.<n>Through an extensive evaluation of nine popular LVLMs across five demanded user inputs, the framework shows effectiveness and reliability.
arXiv Detail & Related papers (2024-10-28T17:55:08Z) - SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.18330795060871]
SPIQA is a dataset specifically designed to interpret complex figures and tables within the context of scientific research articles.<n>We employ automatic and manual curation to create the dataset.<n> SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits.
arXiv Detail & Related papers (2024-07-12T16:37:59Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Student Data Paradox and Curious Case of Single Student-Tutor Model: Regressive Side Effects of Training LLMs for Personalized Learning [25.90420385230675]
The pursuit of personalized education has led to the integration of Large Language Models (LLMs) in developing intelligent tutoring systems.
Our research uncovers a fundamental challenge in this approach: the Student Data Paradox''
This paradox emerges when LLMs, trained on student data to understand learner behavior, inadvertently compromise their own factual knowledge and reasoning abilities.
arXiv Detail & Related papers (2024-04-23T15:57:55Z) - MathVista: Evaluating Mathematical Reasoning of Foundation Models in
Visual Contexts [170.01089233942594]
MathVista is a benchmark designed to combine challenges from diverse mathematical and visual tasks.
The best-performing GPT-4V model achieves an overall accuracy of 49.9%, substantially outperforming Bard, the second-best performer, by 15.1%.
GPT-4V still falls short of human performance by 10.4%, as it often struggles to understand complex figures and perform rigorous reasoning.
arXiv Detail & Related papers (2023-10-03T17:57:24Z) - LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs.
We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Multimodal Lecture Presentations Dataset: Understanding Multimodality in
Educational Slides [57.86931911522967]
We test the capabilities of machine learning models in multimodal understanding of educational content.
Our dataset contains aligned slides and spoken language, for 180+ hours of video and 9000+ slides, with 10 lecturers from various subjects.
We introduce PolyViLT, a multimodal transformer trained with a multi-instance learning loss that is more effective than current approaches.
arXiv Detail & Related papers (2022-08-17T05:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.