CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video
- URL: http://arxiv.org/abs/2506.20600v1
- Date: Wed, 25 Jun 2025 16:39:05 GMT
- Title: CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video
- Authors: Wengxi Li, Roy Pea, Nick Haber, Hari Subramonyam,
- Abstract summary: CogGen is a learner-centered AI architecture that transforms programming videos into interactive, adaptive learning experiences.<n>This work advances AI-powered tutoring by bridging structured student modeling with interactive AI conversations.
- Score: 1.6961276655027102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce CogGen, a learner-centered AI architecture that transforms programming videos into interactive, adaptive learning experiences by integrating student modeling with generative AI tutoring based on the Cognitive Apprenticeship framework. The architecture consists of three components: (1) video segmentation by learning goals, (2) a conversational tutoring engine applying Cognitive Apprenticeship strategies, and (3) a student model using Bayesian Knowledge Tracing to adapt instruction. Our technical evaluation demonstrates effective video segmentation accuracy and strong pedagogical alignment across knowledge, method, action, and interaction layers. Ablation studies confirm the necessity of each component in generating effective guidance. This work advances AI-powered tutoring by bridging structured student modeling with interactive AI conversations, offering a scalable approach to enhancing video-based programming education.
Related papers
- Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach [0.0]
This paper introduces a novel Intelligent Teaching Assistant for quantum computing education.<n>The system combines a knowledge-graph-augmented architecture with two specialized Large Language Model (LLM) agents.
arXiv Detail & Related papers (2025-04-24T21:53:34Z) - VideoWorld: Exploring Knowledge Learning from Unlabeled Videos [119.35107657321902]
This work explores whether a deep generative model can learn complex knowledge solely from visual input.<n>We develop VideoWorld, an auto-regressive video generation model trained on unlabeled video data, and test its knowledge acquisition abilities in video-based Go and robotic control tasks.
arXiv Detail & Related papers (2025-01-16T18:59:10Z) - Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs [1.6961276655027102]
Our work transforms programming videos into one-on-one tutoring experiences using the cognitive apprenticeship framework.
Tutorly, developed as a JupyterLab, allows learners to set personalized learning goals.
arXiv Detail & Related papers (2024-05-21T17:17:34Z) - How to Build an Adaptive AI Tutor for Any Course Using Knowledge Graph-Enhanced Retrieval-Augmented Generation (KG-RAG) [5.305156933641317]
Large Language Models (LLMs) in Intelligent Tutoring Systems (ITS) presents transformative opportunities for personalized education.<n>Current implementations face two critical challenges: maintaining factual accuracy and delivering coherent, context-aware instruction.<n>This paper introduces Knowledge Graph-enhanced Retrieval-Augmented Generation (RAG), a novel framework that integrates structured knowledge representation with context-aware retrieval.
arXiv Detail & Related papers (2023-11-29T15:02:46Z) - InstructVid2Vid: Controllable Video Editing with Natural Language Instructions [97.17047888215284]
InstructVid2Vid is an end-to-end diffusion-based methodology for video editing guided by human language instructions.
Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion.
arXiv Detail & Related papers (2023-05-21T03:28:13Z) - Knowledge-enhanced Agents for Interactive Text Games [16.055119735473017]
We propose a knowledge-injection framework for improved functional grounding of agents in text-based games.
We consider two forms of domain knowledge that we inject into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment.
Our framework supports two representative model classes: reinforcement learning agents and language model agents.
arXiv Detail & Related papers (2023-05-08T23:31:39Z) - MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks [59.09343552273045]
We propose a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks.
We demonstrate that joint learning of these diverse objectives is simple, effective, and maximizes the weight-sharing of the model across these tasks.
Our model achieves the state of the art on image-text and text-image retrieval, video question answering and open-vocabulary detection tasks, outperforming much larger and more extensively trained foundational models.
arXiv Detail & Related papers (2023-03-29T16:42:30Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Object Relational Graph with Teacher-Recommended Learning for Video
Captioning [92.48299156867664]
We propose a complete video captioning system including both a novel model and an effective training strategy.
Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation.
Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
arXiv Detail & Related papers (2020-02-26T15:34:52Z) - Interactive Summarizing -- Automatic Slide Localization Technology as
Generative Learning Tool [10.81386784858998]
Video summarization is an effective technology applied to enhance learners' summarizing experience in a video lecture.
An interactive summarizing model is designed to explain how learners are engaged in the video lecture learning process supported by convolutional neural network.
arXiv Detail & Related papers (2020-02-25T22:22:49Z) - Knowledge Integration Networks for Action Recognition [58.548331848942865]
We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition.
We propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information.
The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%.
arXiv Detail & Related papers (2020-02-18T10:20:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.