Related papers: Conceptual Mutation Testing for Student Programming Misconceptions

Conceptual Mutation Testing for Student Programming Misconceptions

URL: http://arxiv.org/abs/2401.00021v1
Date: Thu, 28 Dec 2023 11:43:04 GMT
Title: Conceptual Mutation Testing for Student Programming Misconceptions
Authors: Siddhartha Prasad (Brown University, USA), Ben Greenman (Brown University, USA), Tim Nelson (Brown University, USA), Shriram Krishnamurthi (Brown University, USA)
Abstract summary: Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration. Students can be made to better understand the problem by writing examples before they start programming.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context: Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration, obstructs learning, and imperils grades. Researchers have found that students can be made to better understand the problem by writing examples before they start programming. These examples are checked against correct and wrong implementations -- analogous to mutation testing -- provided by course staff. Doing so results in better student understanding of the problem as well as better test suites to accompany the program, both of which are desirable educational outcomes. Inquiry: Producing mutant implementations requires care. If there are too many, or they are too obscure, students will end up spending a lot of time on an unproductive task and also become frustrated. Instead, we want a small number of mutants that each correspond to common problem misconceptions. This paper presents a workflow with partial automation to produce mutants of this form which, notably, are not those produced by mutation-testing tools. Approach: We comb through student tests that fail a correct implementation. The student misconceptions are embedded in these failures. We then use methods to semantically cluster these failures. These clusters are then translated into conceptual mutants. These can then be run against student data to determine whether we they are better than prior methods. Some of these processes also enjoy automation. Knowledge: We find that student misconceptions illustrated by failing tests can be operationalized by the above process. The resulting mutants do much better at identifying student misconceptions. Grounding: Our findings are grounded in a manual analysis of student examples and a quantitative evaluation of both our clustering techniques and our process for making conceptual mutants. The clustering evaluation compares against a ground truth using standard cluster-correspondence measures, while the mutant evaluation examines how conceptual mutants perform against student data. Importance: Our work contributes a workflow, with some automation, to reduce the cost and increase the effectiveness of generating conceptually interesting mutants. Such mutants can both improve learning outcomes and reduce student frustration, leading to better educational outcomes. In the process, we also identify a variation of mutation testing not commonly discussed in the software literature.

Related papers

Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging [10.334617290353192]
We evaluate whether Scientific computation can help Large Language Models (LLMs) to generate tests for mutants. LLMs consistently outperform Pynguin in generating tests with better fault detection and coverage. Importantly, we observe that the iterative refinement of test cases is important for achieving high-quality test suites.
arXiv Detail & Related papers (2025-03-11T08:47:13Z)
Assessing Changes in Thinking about Troubleshooting in Physical Computing: A Clinical Interview Protocol with Failure Artifacts Scenarios [0.0]
The purpose of this paper is to examine how a clinical interview protocol with failure artifact scenarios can capture changes in high school students' explanations of troubleshooting processes in physical computing activities. We developed and piloted a "failure artifact scenarios" clinical interview protocol. Youth were presented with buggy physical computing projects over video calls and asked for suggestions on how to fix them without having access to the actual project or its code.
arXiv Detail & Related papers (2024-12-04T19:48:56Z)
BugSpotter: Automated Generation of Code Debugging Exercises [22.204802715829615]
This paper introduces BugSpotter, a tool to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification.
arXiv Detail & Related papers (2024-11-21T16:56:33Z)
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks? [74.88417042125985]
We investigate various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity. We find that even when the outcome error rate for hard task supervision is high, training on such data can outperform perfectly correct supervision of easier subtasks. Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements.
arXiv Detail & Related papers (2024-10-27T17:55:27Z)
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. LLMs struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z)
An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent. Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z)
CUCL: Codebook for Unsupervised Continual Learning [129.91731617718781]
The focus of this study is on Unsupervised Continual Learning (UCL), as it presents an alternative to Supervised Continual Learning. We propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary. Our method significantly boosts the performances of supervised and unsupervised methods.
arXiv Detail & Related papers (2023-11-25T03:08:50Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Identifying Different Student Clusters in Functional Programming Assignments: From Quick Learners to Struggling Students [2.0386745041807033]
We analyze student assignment submission data collected from a functional programming course taught at McGill university. This allows us to identify four clusters of students: "Quick-learning", "Hardworking", "Satisficing", and "Struggling" We then analyze how work habits, working duration, the range of errors, and the ability to fix errors impact different clusters of students.
arXiv Detail & Related papers (2023-01-06T17:15:58Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback. We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
DeepMutation: A Neural Mutation Tool [26.482720255691646]
DeepMutation is a tool wrapping our deep learning model into a fully automated tool chain. It can generate, inject, and test mutants learned from real faults.
arXiv Detail & Related papers (2020-02-12T01:57:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.