Related papers: Computer Aided Design and Grading for an Electronic Functional Programming Exam

Computer Aided Design and Grading for an Electronic Functional Programming Exam

URL: http://arxiv.org/abs/2308.07938v1
Date: Mon, 14 Aug 2023 07:08:09 GMT
Title: Computer Aided Design and Grading for an Electronic Functional Programming Exam
Authors: Ole L\"ubke (TUHH), Konrad Fuger (TUHH), Fin Hendrik Bahnsen (UK-Essen), Katrin Billerbeck (TUHH), Sibylle Schupp (TUHH)
Abstract summary: We introduce an algorithm to check Proof Puzzles based on finding correct sequences of proof lines that improves fairness compared to an existing, edit distance based algorithm. A higher-level language and open-source tool to specify regular expressions makes creating complex regular expressions less error-prone. We evaluate the resulting e-exam by analyzing the degree of automation in the grading process, asking students for their opinion, and critically reviewing our own experiences.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Electronic exams (e-exams) have the potential to substantially reduce the effort required for conducting an exam through automation. Yet, care must be taken to sacrifice neither task complexity nor constructive alignment nor grading fairness in favor of automation. To advance automation in the design and fair grading of (functional programming) e-exams, we introduce the following: A novel algorithm to check Proof Puzzles based on finding correct sequences of proof lines that improves fairness compared to an existing, edit distance based algorithm; an open-source static analysis tool to check source code for task relevant features by traversing the abstract syntax tree; a higher-level language and open-source tool to specify regular expressions that makes creating complex regular expressions less error-prone. Our findings are embedded in a complete experience report on transforming a paper exam to an e-exam. We evaluated the resulting e-exam by analyzing the degree of automation in the grading process, asking students for their opinion, and critically reviewing our own experiences. Almost all tasks can be graded automatically at least in part (correct solutions can almost always be detected as such), the students agree that an e-exam is a fitting examination format for the course but are split on how well they can express their thoughts compared to a paper exam, and examiners enjoy a more time-efficient grading process while the point distribution in the exam results was almost exactly the same compared to a paper exam.

Related papers

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems [61.26070215983157]
We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for math reasoning as problem generators for stress-testing models. We explore the automatic construction of EFAs for advanced math problems.
arXiv Detail & Related papers (2025-04-14T00:06:48Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting [6.938766764201549]
This paper introduces an automated approach to develop test cases by exploiting the power of large language models and statistical techniques. We analyze the behavioral test profiles across four different classification algorithms and discuss the limitations and strengths of those models.
arXiv Detail & Related papers (2024-07-31T21:12:21Z)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z)
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation [9.390902237835457]
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG) Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions.
arXiv Detail & Related papers (2024-05-22T13:14:11Z)
SimGrade: Using Code Similarity Measures for More Accurate Human Grading [5.797317782326566]
We show that inaccurate and inconsistent grading of free-response programming problems is widespread in CS1 courses. We propose several algorithms for assigning student submissions to graders, and (2) ordering submissions to maximize the probability that a grader has previously seen a similar solution.
arXiv Detail & Related papers (2024-02-19T23:06:23Z)
Reinforcement Learning Guided Multi-Objective Exam Paper Generation [21.945655389912112]
We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG. It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage. We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
arXiv Detail & Related papers (2023-03-02T07:55:52Z)
Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Active Learning from Crowd in Document Screening [76.9545252341746]
We focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.
arXiv Detail & Related papers (2020-11-11T16:17:28Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
Automated Content Grading Using Machine Learning [0.0]
This research project is a primitive experiment in the automation of grading of theoretical answers written in exams by students in technical courses. We show how the algorithmic approach in machine learning can be used to automatically examine and grade theoretical content in exam answer papers.
arXiv Detail & Related papers (2020-04-08T23:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.