Related papers: Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" Questions

Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" Questions

URL: http://arxiv.org/abs/2311.14903v1
Date: Sat, 25 Nov 2023 02:45:00 GMT
Title: Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" Questions
Authors: David H. Smith IV and Craig Zilles
Abstract summary: "Code Generation Based Grading" (CGBG) achieves moderate agreement with human graders. CGBG achieves moderate agreement with human graders with respect to low-level and line-by-line descriptions of code.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Comprehending and elucidating the purpose of code is often cited as being a key learning objective within introductory programming courses. To address this objective ``Explain-in-Plain-English'' questions, in which students are shown a segment of code and asked to provide an abstract description of the code's purpose, have been adopted. However, given EiPE questions require a natural language response, they often require manual grading which is time-consuming for course staff and delays feedback for students. With the advent of large language models (LLMs) capable of generating code, responses to EiPE questions can be used to generate code segments, the correctness of which can then be easily verified using test cases. We refer to this approach as "Code Generation Based Grading" (CGBG) and in this paper we explore its agreement with human graders using EiPE responses from past exams in an introductory programming course taught in Python. Overall, we find that CGBG achieves moderate agreement with human graders with the primary area of disagreement being its leniency with respect to low-level and line-by-line descriptions of code.

Related papers

Counting the Trees in the Forest: Evaluating Prompt Segmentation for Classifying Code Comprehension Level [2.250363093539224]
This paper introduces a novel method for automatically assessing the comprehension level of responses to Explain in Plain English'' questions. Using a Large Language Model (LLM) to segment both the student's description and the code, we aim to determine whether the student describes each line individually (many segments) or the code as a whole.
arXiv Detail & Related papers (2025-03-15T17:57:38Z)
ReDefining Code Comprehension: Function Naming as a Mechanism for Evaluating Code Comprehension [2.250363093539224]
"Explain in Plain English" (EiPE) questions are widely used to assess code comprehension skills. Recent approaches like Code Generation Based Grading (CGBG) leverage large language models to generate code. We propose a modified approach where students generate function names, emphasizing the function's purpose over implementation details.
arXiv Detail & Related papers (2025-03-15T17:22:14Z)
Explain in Plain Language Questions with Indic Languages: Drawbacks, Affordances, and Opportunities [1.9121661610146587]
We evaluate the efficacy of a recently introduced approach called Code Generation Based Grading (CGBG) in enabling language agnostic Explain in Plain Language'' activities.
arXiv Detail & Related papers (2024-09-30T13:56:29Z)
Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises [0.0]
We introduce ECHO, a machine learning method to automate the reuse of feedback in educational code reviews. Based on annotations from both automated linting tools and human reviewers, we show that ECHO can accurately and quickly predict appropriate feedback annotations.
arXiv Detail & Related papers (2024-04-26T14:03:19Z)
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy. Results indicate that MANGO significantly improves the code pass rate based on the strong baselines. The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z)
A Knowledge-Injected Curriculum Pretraining Framework for Question Answering [70.13026036388794]
We propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for Knowledge-based question answering tasks. The KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps. The KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability. The CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner.
arXiv Detail & Related papers (2024-03-11T03:42:03Z)
Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills [4.776920192249936]
We propose using an LLM to generate code based on students' responses to EiPE questions. We report student success in creating effective prompts for solving EiPE questions.
arXiv Detail & Related papers (2024-03-10T00:23:08Z)
Exploring the Potential of Large Language Models to Generate Formative Programming Feedback [0.5371337604556311]
We explore the potential of large language models (LLMs) for computing educators and learners. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT. Results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.
arXiv Detail & Related papers (2023-08-31T15:22:11Z)
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code [0.0]
We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments. These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses.
arXiv Detail & Related papers (2023-03-09T16:52:12Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision. Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z)
Semantic Graphs for Generating Deep Questions [98.5161888878238]
We propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN) On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance.
arXiv Detail & Related papers (2020-04-27T10:52:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.