CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video
- URL: http://arxiv.org/abs/2506.20600v1
- Date: Wed, 25 Jun 2025 16:39:05 GMT
- Title: CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video
- Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam,
- Abstract summary: CogGen is a learner-centered AI architecture that transforms programming videos into interactive, adaptive learning experiences.<n>This work advances AI-powered tutoring by bridging structured student modeling with interactive AI conversations.
- Score: 1.6961276655027102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce CogGen, a learner-centered AI architecture that transforms programming videos into interactive, adaptive learning experiences by integrating student modeling with generative AI tutoring based on the Cognitive Apprenticeship framework. The architecture consists of three components: (1) video segmentation by learning goals, (2) a conversational tutoring engine applying Cognitive Apprenticeship strategies, and (3) a student model using Bayesian Knowledge Tracing to adapt instruction. Our technical evaluation demonstrates effective video segmentation accuracy and strong pedagogical alignment across knowledge, method, action, and interaction layers. Ablation studies confirm the necessity of each component in generating effective guidance. This work advances AI-powered tutoring by bridging structured student modeling with interactive AI conversations, offering a scalable approach to enhancing video-based programming education.
Related papers
- PedaCo-Gen: Scaffolding Pedagogical Agency in Human-AI Collaborative Video Authoring [28.634225905526677]
This study introduces PedaCo-Gen, a collaborative video generating system for authoring instructional videos based on Mayer's Cognitive Theory of Multimedia Learning (CTML)<n>Moving away from traditional "one-shot" generation, PedaCo-Gen introduces an Intermediate Representation phase, enabling educators to interactively review and refine video blueprints-comprising scripts and visual descriptions-with an AI reviewer.<n>Our study with 23 education experts demonstrates that PedaCo-Gen significantly enhances video quality across various topics and CTML principles compared to baselines.
arXiv Detail & Related papers (2026-02-23T09:12:13Z) - Open TutorAI: An Open-source Platform for Personalized and Immersive Learning with Generative AI [1.440818306216858]
This paper presents Open TutorAI, an open-source educational platform based on LLMs and generative technologies.<n>The system integrates natural language processing with customizable 3D avatars to enable multimodal learner interaction.<n>It includes tools for organizing content, providing embedded feedback, and offering dedicated interfaces for learners, educators, and parents.
arXiv Detail & Related papers (2026-02-06T20:24:33Z) - Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval [53.54695034420311]
In practice, videos are typically untrimmed in long durations with much more complicated background content.<n>We propose a novel framework that distills generalization knowledge from a powerful large-scale vision-language pre-trained model.<n>Experiment results demonstrate that our proposed model achieves state-of-the-art performance on TVR, ActivityNet, and Charades-STA datasets.
arXiv Detail & Related papers (2025-10-14T08:38:20Z) - Code2Video: A Code-centric Paradigm for Educational Video Generation [60.03043132859077]
We propose Code2Video, a code-centric agent framework for generating educational videos via Python code.<n>The framework comprises three collaborative agents: (i) Planner, which structures lecture content into temporally coherent flows; (ii) Coder, which converts structured instructions into executable Python codes while incorporating scope-guided auto-fix to enhance efficiency; and (iii) Critic, which leverages vision-language models (VLM) with visual anchor prompts to refine spatial layout and ensure clarity.<n>Our results demonstrate the potential of Code2Video as a scalable, interpretable, and controllable approach, achieving 40% improvement over direct code
arXiv Detail & Related papers (2025-10-01T17:56:48Z) - Designing LMS and Instructional Strategies for Integrating Generative-Conversational AI [0.0]
This study introduces a structured framework for designing an AI-powered Learning Management System.<n>It integrates generative and conversational AI to support adaptive, interactive, and learner-centered instruction.
arXiv Detail & Related papers (2025-08-31T06:01:50Z) - Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach [0.0]
This paper introduces a novel Intelligent Teaching Assistant for quantum computing education.<n>The system combines a knowledge-graph-augmented architecture with two specialized Large Language Model (LLM) agents.
arXiv Detail & Related papers (2025-04-24T21:53:34Z) - VideoWorld: Exploring Knowledge Learning from Unlabeled Videos [119.35107657321902]
This work explores whether a deep generative model can learn complex knowledge solely from visual input.<n>We develop VideoWorld, an auto-regressive video generation model trained on unlabeled video data, and test its knowledge acquisition abilities in video-based Go and robotic control tasks.
arXiv Detail & Related papers (2025-01-16T18:59:10Z) - Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs [1.6961276655027102]
Our work transforms programming videos into one-on-one tutoring experiences using the cognitive apprenticeship framework.
Tutorly, developed as a JupyterLab, allows learners to set personalized learning goals.
arXiv Detail & Related papers (2024-05-21T17:17:34Z) - How to Build an Adaptive AI Tutor for Any Course Using Knowledge Graph-Enhanced Retrieval-Augmented Generation (KG-RAG) [5.305156933641317]
Large Language Models (LLMs) in Intelligent Tutoring Systems (ITS) presents transformative opportunities for personalized education.<n>Current implementations face two critical challenges: maintaining factual accuracy and delivering coherent, context-aware instruction.<n>This paper introduces Knowledge Graph-enhanced Retrieval-Augmented Generation (RAG), a novel framework that integrates structured knowledge representation with context-aware retrieval.
arXiv Detail & Related papers (2023-11-29T15:02:46Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z) - Knowledge-enhanced Agents for Interactive Text Games [16.055119735473017]
We propose a knowledge-injection framework for improved functional grounding of agents in text-based games.
We consider two forms of domain knowledge that we inject into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment.
Our framework supports two representative model classes: reinforcement learning agents and language model agents.
arXiv Detail & Related papers (2023-05-08T23:31:39Z) - MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks [59.09343552273045]
We propose a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks.
We demonstrate that joint learning of these diverse objectives is simple, effective, and maximizes the weight-sharing of the model across these tasks.
Our model achieves the state of the art on image-text and text-image retrieval, video question answering and open-vocabulary detection tasks, outperforming much larger and more extensively trained foundational models.
arXiv Detail & Related papers (2023-03-29T16:42:30Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Object Relational Graph with Teacher-Recommended Learning for Video
Captioning [92.48299156867664]
We propose a complete video captioning system including both a novel model and an effective training strategy.
Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation.
Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
arXiv Detail & Related papers (2020-02-26T15:34:52Z) - Interactive Summarizing -- Automatic Slide Localization Technology as
Generative Learning Tool [10.81386784858998]
Video summarization is an effective technology applied to enhance learners' summarizing experience in a video lecture.
An interactive summarizing model is designed to explain how learners are engaged in the video lecture learning process supported by convolutional neural network.
arXiv Detail & Related papers (2020-02-25T22:22:49Z) - Knowledge Integration Networks for Action Recognition [58.548331848942865]
We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition.
We propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information.
The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%.
arXiv Detail & Related papers (2020-02-18T10:20:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.