Counting the Trees in the Forest: Evaluating Prompt Segmentation for Classifying Code Comprehension Level
- URL: http://arxiv.org/abs/2503.12216v1
- Date: Sat, 15 Mar 2025 17:57:38 GMT
- Title: Counting the Trees in the Forest: Evaluating Prompt Segmentation for Classifying Code Comprehension Level
- Authors: David H. Smith IV, Max Fowler, Paul Denny, Craig Zilles,
- Abstract summary: This paper introduces a novel method for automatically assessing the comprehension level of responses to Explain in Plain English'' questions.<n>Using a Large Language Model (LLM) to segment both the student's description and the code, we aim to determine whether the student describes each line individually (many segments) or the code as a whole.
- Score: 2.250363093539224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reading and understanding code are fundamental skills for novice programmers, and especially important with the growing prevalence of AI-generated code and the need to evaluate its accuracy and reliability. ``Explain in Plain English'' questions are a widely used approach for assessing code comprehension, but providing automated feedback, particularly on comprehension levels, is a challenging task. This paper introduces a novel method for automatically assessing the comprehension level of responses to ``Explain in Plain English'' questions. Central to this is the ability to distinguish between two response types: multi-structural, where students describe the code line-by-line, and relational, where they explain the code's overall purpose. Using a Large Language Model (LLM) to segment both the student's description and the code, we aim to determine whether the student describes each line individually (many segments) or the code as a whole (fewer segments). We evaluate this approach's effectiveness by comparing segmentation results with human classifications, achieving substantial agreement. We conclude with how this approach, which we release as an open source Python package, could be used as a formative feedback mechanism.
Related papers
- ReDefining Code Comprehension: Function Naming as a Mechanism for Evaluating Code Comprehension [2.250363093539224]
"Explain in Plain English" (EiPE) questions are widely used to assess code comprehension skills.<n>Recent approaches like Code Generation Based Grading (CGBG) leverage large language models to generate code.<n>We propose a modified approach where students generate function names, emphasizing the function's purpose over implementation details.
arXiv Detail & Related papers (2025-03-15T17:22:14Z) - Towards Identifying Code Proficiency through the Analysis of Python Textbooks [7.381102801726683]
The aim is to gauge the level of proficiency a developer must have to understand a piece of source code.
Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies.
This paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks.
arXiv Detail & Related papers (2024-08-05T06:37:10Z) - Code Generation Based Grading: Evaluating an Auto-grading Mechanism for
"Explain-in-Plain-English" Questions [0.0]
"Code Generation Based Grading" (CGBG) achieves moderate agreement with human graders.
CGBG achieves moderate agreement with human graders with respect to low-level and line-by-line descriptions of code.
arXiv Detail & Related papers (2023-11-25T02:45:00Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt [71.77504700496004]
Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts.
To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts.
However, how and what prompts can improve inference performance remains unclear.
arXiv Detail & Related papers (2022-05-23T07:51:15Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning [91.58529629419135]
We consider how to characterise visual groupings discovered automatically by deep neural networks.
We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings.
arXiv Detail & Related papers (2020-10-27T18:41:49Z) - Word Embedding-based Text Processing for Comprehensive Summarization and
Distinct Information Extraction [1.552282932199974]
We propose two automated text processing frameworks specifically designed to analyze online reviews.
The first framework is to summarize the reviews dataset by extracting essential sentence.
The second framework is based on a question-answering neural network model trained to extract answers to multiple different questions.
arXiv Detail & Related papers (2020-04-21T02:43:31Z) - Key Phrase Classification in Complex Assignments [5.067828201066184]
We show that the task of classification of key phrases is ambiguous at a human level producing Cohen's kappa of 0.77 on a new data set.
Both pretrained language models and simple TFIDF SVM classifiers produce similar results with a former producing average of 0.6 F1 higher than the latter.
arXiv Detail & Related papers (2020-03-16T04:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.