Evaluation of Systems Programming Exercises through Tailored Static Analysis
- URL: http://arxiv.org/abs/2410.17260v3
- Date: Wed, 06 Nov 2024 08:56:06 GMT
- Title: Evaluation of Systems Programming Exercises through Tailored Static Analysis
- Authors: Roberto Natella,
- Abstract summary: In large programming classes, it takes a significant effort from teachers to evaluate exercises and provide detailed feedback.
In systems, test cases are not sufficient to assess exercises, since detailed programming and resource management bugs are difficult to reproduce.
This paper presents an experience report on static analysis for the automatic evaluation of systems programming exercises.
- Score: 4.335676282295717
- License:
- Abstract: In large programming classes, it takes a significant effort from teachers to evaluate exercises and provide detailed feedback. In systems programming, test cases are not sufficient to assess exercises, since concurrency and resource management bugs are difficult to reproduce. This paper presents an experience report on static analysis for the automatic evaluation of systems programming exercises. We design systems programming assignments with static analysis rules that are tailored for each assignment, to provide detailed and accurate feedback. Our evaluation shows that static analysis can identify a significant number of erroneous submissions missed by test cases.
Related papers
- Easing Maintenance of Academic Static Analyzers [0.0]
Mopsa is a static analysis platform that aims at being sound.
This article documents the tools and techniques we have come up with to simplify the maintenance of Mopsa since 2017.
arXiv Detail & Related papers (2024-07-17T11:29:21Z) - LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier.
Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z) - Evaluating Mathematical Reasoning Beyond Accuracy [50.09931172314218]
We introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps.
We show that ReasonEval consistently outperforms baseline methods in the meta-evaluation datasets.
We observe that ReasonEval can play a significant role in data selection.
arXiv Detail & Related papers (2024-04-08T17:18:04Z) - Understanding and Detecting Annotation-Induced Faults of Static
Analyzers [4.824956210843882]
This paper presents the first comprehensive study of annotation-induced faults (AIF)
We analyzed 246 issues in six open-source and popular static analyzers (i.e., PMD, SpotBugs, CheckStyle, Infer, SonarQube, and Soot)
arXiv Detail & Related papers (2024-02-22T08:09:01Z) - Computer Aided Design and Grading for an Electronic Functional
Programming Exam [0.0]
We introduce an algorithm to check Proof Puzzles based on finding correct sequences of proof lines that improves fairness compared to an existing, edit distance based algorithm.
A higher-level language and open-source tool to specify regular expressions makes creating complex regular expressions less error-prone.
We evaluate the resulting e-exam by analyzing the degree of automation in the grading process, asking students for their opinion, and critically reviewing our own experiences.
arXiv Detail & Related papers (2023-08-14T07:08:09Z) - Analyzing Dataset Annotation Quality Management in the Wild [63.07224587146207]
Even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts.
While practices and guidelines regarding dataset creation projects exist, large-scale analysis has yet to be performed on how quality management is conducted.
arXiv Detail & Related papers (2023-07-16T21:22:40Z) - Automated Grading and Feedback Tools for Programming Education: A
Systematic Review [7.776434991976473]
Most papers assess the correctness of assignments in object-oriented languages.
Few tools assess the maintainability, readability or documentation of the source code.
Most tools offered fully automated assessment to allow for near-instantaneous feedback.
arXiv Detail & Related papers (2023-06-20T17:54:50Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - On the Use of Static Analysis to Engage Students with Software Quality
Improvement: An Experience with PMD [12.961585735468313]
We aim to reflect on our experience with teaching the use of static analysis for the purpose of evaluating its effectiveness in helping students with respect to improving software quality.
This paper discusses the results of an experiment in the classroom over a period of 3 academic semesters, involving 65 submissions that carried out code review activity of 690 rules using PMD.
arXiv Detail & Related papers (2023-02-11T00:21:04Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.