PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback
- URL: http://arxiv.org/abs/2502.18425v1
- Date: Tue, 25 Feb 2025 18:20:20 GMT
- Title: PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback
- Authors: Nils Wandel, David Stotko, Alexander Schier, Reinhard Klein,
- Abstract summary: PyEvalAI scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy.<n>A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.
- Score: 43.56788158589046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grading student assignments in STEM courses is a laborious and repetitive task for tutors, often requiring a week to assess an entire class. For students, this delay of feedback prevents iterating on incorrect solutions, hampers learning, and increases stress when exercise scores determine admission to the final exam. Recent advances in AI-assisted education, such as automated grading and tutoring systems, aim to address these challenges by providing immediate feedback and reducing grading workload. However, existing solutions often fall short due to privacy concerns, reliance on proprietary closed-source models, lack of support for combining Markdown, LaTeX and Python code, or excluding course tutors from the grading process. To overcome these limitations, we introduce PyEvalAI, an AI-assisted evaluation system, which automatically scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy. Our approach is free, open-source, and ensures tutors maintain full control over the grading process. A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.
Related papers
- The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own [1.2891210250935148]
This paper presents a comprehensive investigation into the capability of Large Language Models (LLMs) to successfully complete a control systems course.
We assess LLM performance using ChatGPT under a "minimal effort" protocol that simulates realistic student usage patterns.
Our analysis provides quantitative insights into AI's strengths and limitations in handling mathematical formulations, coding challenges, and theoretical concepts in control systems engineering.
arXiv Detail & Related papers (2025-02-23T18:47:14Z) - Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben [0.0]
This study examines the AI-powered grading tool "AI Grading Assistant" by the German company Fobizz.<n>The tool's numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated.<n>The study critiques the broader trend of adopting AI as a quick fix for systemic problems in education.
arXiv Detail & Related papers (2024-12-09T16:50:02Z) - Beyond Scores: A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback [3.2734777984053887]
We propose a modular retrieval augmented generation based ASAS-F system that scores answers and generates feedback in strict zero-shot and few-shot learning scenarios.
Results show an improvement in scoring accuracy by 9% on unseen questions compared to fine-tuning, offering a scalable and cost-effective solution.
arXiv Detail & Related papers (2024-09-30T07:48:55Z) - Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [176.39275404745098]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions.<n>GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions.<n>Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z) - WIP: A Unit Testing Framework for Self-Guided Personalized Online Robotics Learning [3.613641107321095]
This paper focuses on creating a system for unit testing while integrating it into the course workflow.
In line with the framework's personalized student-centered approach, this method makes it easier for students to revise, and debug their programming work.
The course workflow updated to include unit tests will strengthen the learning environment and make it more interactive so that students can learn how to program robots in a self-guided fashion.
arXiv Detail & Related papers (2024-05-18T00:56:46Z) - Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [46.667783153759636]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)<n>Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - Continuous Examination by Automatic Quiz Assessment Using Spiral Codes
and Image Processing [69.35569554213679]
Paper quizzes are affordable and within reach of campus education in classrooms.
correction of the quiz is a considerable obstacle.
We suggest mitigating the issue by a novel image processing technique.
arXiv Detail & Related papers (2022-01-26T22:58:15Z) - Winning solutions and post-challenge analyses of the ChaLearn AutoDL
challenge 2019 [112.36155380260655]
This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series.
Results show that DL methods dominated, though popular Neural Architecture Search (NAS) was impractical.
A high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator"
arXiv Detail & Related papers (2022-01-11T06:21:18Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.