Proactive Agentic Whiteboards: Enhancing Diagrammatic Learning
- URL: http://arxiv.org/abs/2512.01234v2
- Date: Tue, 02 Dec 2025 03:13:40 GMT
- Title: Proactive Agentic Whiteboards: Enhancing Diagrammatic Learning
- Authors: Suveen Ellawela, Sashenka Gamage, Dinithi Dissanayake,
- Abstract summary: We introduce DrawDash, an AI-powered whiteboard assistant that proactively completes and refines educational diagrams through multimodal understanding.<n>We demonstrate DrawDash across four diverse teaching scenarios, spanning topics from computer science and web development to biology.<n>This work represents an early exploration into reducing instructors' cognitive load and improving diagram-based pedagogy through real-time, speech-driven visual assistance, and concludes with a discussion of current limitations and directions for formal classroom evaluation.
- Score: 1.338174941551702
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Educators frequently rely on diagrams to explain complex concepts during lectures, yet creating clear and complete visual representations in real time while simultaneously speaking can be cognitively demanding. Incomplete or unclear diagrams may hinder student comprehension, as learners must mentally reconstruct missing information while following the verbal explanation. Inspired by advances in code completion tools, we introduce DrawDash, an AI-powered whiteboard assistant that proactively completes and refines educational diagrams through multimodal understanding. DrawDash adopts a TAB-completion interaction model: it listens to spoken explanations, detects intent, and dynamically suggests refinements that can be accepted with a single keystroke. We demonstrate DrawDash across four diverse teaching scenarios, spanning topics from computer science and web development to biology. This work represents an early exploration into reducing instructors' cognitive load and improving diagram-based pedagogy through real-time, speech-driven visual assistance, and concludes with a discussion of current limitations and directions for formal classroom evaluation.
Related papers
- Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding [44.53837800796001]
We introduce Chimera, a test suite comprising 7,500 high-quality diagrams sourced from Wikipedia.<n>Each diagram is annotated with its symbolic content represented by semantic triples along with multi-level questions.<n>We use Chimera to measure the presence of three types of shortcuts in visual question answering.
arXiv Detail & Related papers (2025-09-26T14:55:04Z) - LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction [0.0]
LEARN is a layout-aware diffusion framework designed to generate pedagogically aligned illustrations for STEM education.<n>It is the first generative approach to unify layout-based storytelling, semantic structure learning, and cognitive scaffolding.<n>The code and dataset will be released to facilitate future research and practical deployment.
arXiv Detail & Related papers (2025-08-15T01:49:58Z) - Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation [19.4261670152456]
We introduce the multimodal solution explanation task, designed to evaluate whether models can identify visual keypoints, such as auxiliary lines, points, angles, and generate explanations that incorporate these key elements essential for understanding.<n>Our empirical results show that, aside from recent large-scale open-source and closed-source models, most generalist open-source models, and even math-specialist models, struggle with the multimodal solution explanation task.<n>This highlights a significant gap in current LLMs' ability to reason and explain with visual grounding in educational contexts.
arXiv Detail & Related papers (2025-04-04T06:03:13Z) - Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning [58.73625654718187]
Generalized zero-shot learning aims to recognize both seen and unseen classes with the help of semantic information that is shared among different classes.<n>Existing approaches fine-tune the visual backbone by seen-class data to obtain semantic-related visual features.<n>This paper proposes a novel visual and semantic prompt collaboration framework, which utilizes prompt tuning techniques for efficient feature adaptation.
arXiv Detail & Related papers (2025-03-29T10:17:57Z) - Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving [25.22658210339668]
This paper introduces Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning.<n>User studies conducted on math problems such as geometry, calculus, and demonstrate that Interactive Sketchpad leads to improved task comprehension, problem-solving accuracy, and engagement levels.
arXiv Detail & Related papers (2025-02-12T00:59:25Z) - Do Vision-Language Models Really Understand Visual Language? [43.893398898373995]
Diagrams are a typical example of a visual language depicting complex concepts and their relationships in the form of an image.<n>Recent studies suggest that Large Vision-Language Models (LVLMs) can even tackle complex reasoning tasks involving diagrams.<n>This paper develops a comprehensive test suite to evaluate the diagram comprehension capability of LVLMs.
arXiv Detail & Related papers (2024-09-30T19:45:11Z) - Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning [114.59476118365266]
We propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment.<n> AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the visual and attribute modalities, grounded on the modal-sharing token that represents consistent visual-semantic concepts; and 2) yielding semantic-enhanced prompt via the visual residual refinement unit with attribute consistency supervision.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization [4.634780391920529]
We propose a novel explainable prompt learning framework that leverages medical knowledge by aligning the semantics of images, learnable prompts, and clinical concept-driven prompts.
Our framework addresses the lack of valuable concept annotations by eliciting knowledge from large language models.
Our method simultaneously achieves superior diagnostic performance, flexibility, and interpretability, shedding light on the effectiveness of foundation models in facilitating XAI.
arXiv Detail & Related papers (2024-03-14T14:02:01Z) - A Message Passing Perspective on Learning Dynamics of Contrastive
Learning [60.217972614379065]
We show that if we cast a contrastive objective equivalently into the feature space, then its learning dynamics admits an interpretable form.
This perspective also establishes an intriguing connection between contrastive learning and Message Passing Graph Neural Networks (MP-GNNs)
arXiv Detail & Related papers (2023-03-08T08:27:31Z) - Multimodal Lecture Presentations Dataset: Understanding Multimodality in
Educational Slides [57.86931911522967]
We test the capabilities of machine learning models in multimodal understanding of educational content.
Our dataset contains aligned slides and spoken language, for 180+ hours of video and 9000+ slides, with 10 lecturers from various subjects.
We introduce PolyViLT, a multimodal transformer trained with a multi-instance learning loss that is more effective than current approaches.
arXiv Detail & Related papers (2022-08-17T05:30:18Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.