SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
- URL: http://arxiv.org/abs/2507.22904v1
- Date: Sun, 29 Jun 2025 11:35:10 GMT
- Title: SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
- Authors: Ehsan Latif, Zirak Khan, Xiaoming Zhai,
- Abstract summary: SketchMind is a multi-agent framework for evaluating and improving student-drawn scientific sketches.<n>It comprises modular agents responsible for parsing, sketch perception, cognitive alignment, and iterative feedback with sketch modification.<n>Experts noted the system's potential to meaningfully support conceptual growth through guided revision.
- Score: 1.1172147007388977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scientific sketches (e.g., models) offer a powerful lens into students' conceptual understanding, yet AI-powered automated assessment of such free-form, visually diverse artifacts remains a critical challenge. Existing solutions often treat sketch evaluation as either an image classification task or monolithic vision-language models, which lack interpretability, pedagogical alignment, and adaptability across cognitive levels. To address these limitations, we present SketchMind, a cognitively grounded, multi-agent framework for evaluating and improving student-drawn scientific sketches. SketchMind comprises modular agents responsible for rubric parsing, sketch perception, cognitive alignment, and iterative feedback with sketch modification, enabling personalized and transparent evaluation. We evaluate SketchMind on a curated dataset of 3,575 student-generated sketches across six science assessment items with different highest order of Bloom's level that require students to draw models to explain phenomena. Compared to baseline GPT-4o performance without SRG (average accuracy: 55.6%), and with SRG integration achieves 77.1% average accuracy (+21.4% average absolute gain). We also demonstrate that multi-agent orchestration with SRG enhances SketchMind performance, for example, GPT-4.1 gains an average 8.9% increase in sketch prediction accuracy, outperforming single-agent pipelines across all items. Human evaluators rated the feedback and co-created sketches generated by \textsc{SketchMind} with GPT-4.1, which achieved an average of 4.1 out of 5, significantly higher than those of baseline models (e.g., 2.3 for GPT-4o). Experts noted the system's potential to meaningfully support conceptual growth through guided revision. Our code and (pending approval) dataset will be released to support reproducibility and future research in AI-driven education.
Related papers
- Annotation-Free Human Sketch Quality Assessment [56.71509868378274]
This paper studies quality assessment for the first time -- letting you find these badly drawn ones.<n>Key discovery lies in exploiting the magnitude ($L metric and$ norm) of a sketch feature as a quantitative quality metric.<n>We show how such a quality assessment capability can for the first time enable three practical sketch applications.
arXiv Detail & Related papers (2025-07-28T06:18:51Z) - Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z) - A Unified Agentic Framework for Evaluating Conditional Image Generation [66.25099219134441]
Conditional image generation has gained significant attention for its ability to personalize content.<n>This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks.
arXiv Detail & Related papers (2025-04-09T17:04:14Z) - Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing [84.16442052968615]
We introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE)<n>RISEBench focuses on four key reasoning categories: Temporal, Causal, Spatial, and Logical Reasoning.<n>We conduct experiments evaluating nine prominent visual editing models, comprising both open-source and proprietary models.
arXiv Detail & Related papers (2025-04-03T17:59:56Z) - From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education [24.970741456147447]
Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities, achieving near-perfect performance on benchmarks like GSM8K.<n>However, their application in personalized education remains limited due to an overemphasis on correctness over error diagnosis and feedback generation.<n>We introduce textbfMathCCS, a benchmark designed for systematic error analysis and tailored feedback.<n>Second, we develop a sequential error analysis framework that leverages historical data to track trends and improve diagnostic precision.<n>Third, we propose a multi-agent collaborative framework that combines a Time Series Agent for historical analysis and an MLLM Agent for real-
arXiv Detail & Related papers (2025-02-19T14:57:51Z) - SketchRef: a Multi-Task Evaluation Benchmark for Sketch Synthesis [6.832790933688975]
SketchRef is the first comprehensive multi-task evaluation benchmark for sketch synthesis.<n>Tasks are divided into five sub-tasks across four domains: animals, common things, human body, and faces.<n>We validate our approach by collecting 7,920 responses from art enthusiasts.
arXiv Detail & Related papers (2024-08-16T09:32:26Z) - Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach [12.042768320132694]
We propose a novel approach integrating multi-teacher knowledge distillation with a unified skeleton representation.
Our networks are jointly trained on the COCO and MPII datasets, containing 17 and 16 keypoints, respectively.
Our joint models achieved an average accuracy of 70.89 and 76.40, compared to 53.79 and 55.78 when trained on a single dataset and evaluated on both.
arXiv Detail & Related papers (2024-05-30T14:14:39Z) - Overcoming Pitfalls in Graph Contrastive Learning Evaluation: Toward
Comprehensive Benchmarks [60.82579717007963]
We introduce an enhanced evaluation framework designed to more accurately gauge the effectiveness, consistency, and overall capability of Graph Contrastive Learning (GCL) methods.
arXiv Detail & Related papers (2024-02-24T01:47:56Z) - Gemini Pro Defeated by GPT-4V: Evidence from Education [1.0226894006814744]
GPT-4V significantly outperforms Gemini Pro in terms of scoring accuracy and Quadratic Weighted Kappa.
Findings suggest GPT-4V's superior capability in handling complex educational tasks.
arXiv Detail & Related papers (2023-12-27T02:56:41Z) - NERIF: GPT-4V for Automatic Scoring of Drawn Models [0.6278186810520364]
Recently released GPT-4V provides a unique opportunity to advance scientific modeling practices.
We developed a method employing instructional note and rubrics to prompt GPT-4V to score students' drawn models.
GPT-4V scores were compared with human experts' scores to calculate scoring accuracy.
arXiv Detail & Related papers (2023-11-21T20:52:04Z) - Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models [81.20804369985376]
We conduct a large-scale subjective experiment collecting a vast number of real human feedbacks on low-level vision.
The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on 18,973 images.
We design a GPT-participated conversion to process these feedbacks into diverse-format 200K instruction-response pairs.
arXiv Detail & Related papers (2023-11-12T09:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.