Related papers: Transforming Higher Education with AI-Powered Video Lectures

Transforming Higher Education with AI-Powered Video Lectures

URL: http://arxiv.org/abs/2511.20660v1
Date: Thu, 30 Oct 2025 23:33:10 GMT
Title: Transforming Higher Education with AI-Powered Video Lectures
Authors: Dengsheng Zhang,
Abstract summary: The integration of artificial intelligence (AI) into video lecture production has the potential to transform higher education.<n>This paper investigates a semi automated workflow that combines Google Gemini for script generation, Amazon Polly for voice synthesis, and Microsoft PowerPoint for video assembly.
Score: 0.2538209532048866
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The integration of artificial intelligence (AI) into video lecture production has the potential to transform higher education by streamlining content creation and enhancing accessibility. This paper investigates a semi automated workflow that combines Google Gemini for script generation, Amazon Polly for voice synthesis, and Microsoft PowerPoint for video assembly. Unlike fully automated text to video platforms, this hybrid approach preserves pedagogical intent while ensuring script to slide synchronization, narrative coherence, and customization. Case studies demonstrate the effectiveness of Gemini in generating accurate and context-sensitive scripts for visually rich academic presentations, while Polly provides natural-sounding narration with controllable pacing. A two course pilot study was conducted to evaluate AI generated instructional videos (AIIV) against human instructional videos (HIV). Both qualitative and quantitative results indicate that AIIVs are comparable to HIVs in terms of learning outcomes, with students reporting high levels of clarity, coherence, and usability. However, limitations remain, particularly regarding audio quality and the absence of human-like avatars. The findings suggest that AI assisted video production can reduce instructor workload, improve scalability, and deliver effective learning resources, while future improvements in synthetic voices and avatars may further enhance learner engagement.

Related papers

CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos [73.51386721543135]
We propose Contrastive Latent Action Pretraining (CLAP), a framework that aligns the visual latent space from videos with a proprioceptive latent space from robot trajectories.<n>CLAP maps video transitions onto a quantized, physically executable codebook.<n>We introduce a dual-formulation VLA framework offering both CLAP-NTP, an autoregressive model excelling at instruction following and object generalization, and CLAP-RF, a Rectified Flow-based policy designed for high-frequency, precise manipulation.
arXiv Detail & Related papers (2026-01-07T16:26:33Z)
ALIVE: An Avatar-Lecture Interactive Video Engine with Content-Aware Retrieval for Real-Time Interaction [5.691710068675227]
ALIVE is an Avatar-Lecture Interactive Video Engine that transforms passive lecture viewing into a dynamic, real-time learning experience.<n>ALIVE employs lightweight embedding models, FAISS-based retrieval, and segmented avatar synthesis with progressive preloading.<n>We demonstrate the system on a complete medical imaging course, evaluate its retrieval accuracy, latency characteristics, and user experience, and show that ALIVE provides accurate, content-aware, and engaging real-time support.
arXiv Detail & Related papers (2025-12-24T00:33:59Z)
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning [66.51617619673587]
We present Skyra, a specialized large language model (MLLM) that identifies human-perceivable visual artifacts in AI-generated videos.<n>To support this objective, we construct ViF-CoT-4K for Supervised Fine-Tuning (SFT), which represents the first large-scale AI-generated video dataset with fine-grained human annotations.<n>We then develop a two-stage training strategy that systematically enhances our model's artifact's-temporal perception, explanation capability, and detection accuracy.
arXiv Detail & Related papers (2025-12-17T18:48:26Z)
PVChat: Personalized Video Chat with One-Shot Learning [13.731097809455154]
PVChat is a one-shot learning framework that enables subject-aware question answering from a single video for each subject.<n>Our approach optimize a Mixture-of-Heads (MoH) enhanced ViLLM on a synthetically augmented video-QA dataset.<n>We evaluate PVChat on diverse datasets covering medical scenarios, TV series, anime, and real-world footage.
arXiv Detail & Related papers (2025-03-21T11:50:06Z)
Adult learners recall and recognition performance and affective feedback when learning from an AI-generated synthetic video [1.7742433461734404]
The current study recruited 500 participants to investigate adult learners recall and recognition performances as well as their affective feedback on the AI-generated synthetic video.<n>The results indicated no statistically significant difference amongst conditions on recall and recognition performance.<n>However, adult learners preferred to learn from the video formats rather than text materials.
arXiv Detail & Related papers (2024-11-28T21:40:28Z)
MindSpeech: Continuous Imagined Speech Decoding using High-Density fNIRS and Prompt Tuning for Advanced Human-AI Interaction [0.0]
This paper reports a novel method for human-AI interaction by developing a direct brain-AI interface. We discuss a novel AI model, called MindSpeech, which enables open-vocabulary, continuous decoding for imagined speech. We demonstrate significant improvements in key metrics, such as BLEU-1 and BERT P scores, for three out of four participants.
arXiv Detail & Related papers (2024-07-25T16:39:21Z)
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling [79.49128866877922]
Video-Teller is a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment. Video-Teller boosts the training efficiency by utilizing frozen pretrained vision and language modules. It capitalizes on the robust linguistic capabilities of large language models, enabling the generation of both concise and elaborate video descriptions.
arXiv Detail & Related papers (2023-10-08T03:35:27Z)
Generative AI for learning: Investigating the potential of synthetic learning videos [0.6628807224384127]
This research paper explores the utility of using AI-generated synthetic video to create viable educational content for online educational settings. We examined the impact of using AI-generated synthetic video in an online learning platform on both learners content acquisition and learning experience.
arXiv Detail & Related papers (2023-04-07T12:57:42Z)
Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides [57.86931911522967]
We test the capabilities of machine learning models in multimodal understanding of educational content. Our dataset contains aligned slides and spoken language, for 180+ hours of video and 9000+ slides, with 10 lecturers from various subjects. We introduce PolyViLT, a multimodal transformer trained with a multi-instance learning loss that is more effective than current approaches.
arXiv Detail & Related papers (2022-08-17T05:30:18Z)
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations. For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.