Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation
- URL: http://arxiv.org/abs/2410.09318v2
- Date: Tue, 15 Oct 2024 05:48:40 GMT
- Title: Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation
- Authors: Saiful Islam Salim, Rubin Yuchan Yang, Alexander Cooper, Suryashree Ray, Saumya Debray, Sazzadur Rahaman,
- Abstract summary: Large language model (LLM)-based programming assistants can help improve the productivity of professional software developers, but can also facilitate cheating in introductory computer programming courses.
This paper investigates the baseline performance of 5 widely used LLMs on a collection of introductory programming problems, examines adversarial perturbations to degrade their performance, and describes the results of a user study aimed at understanding the efficacy of such perturbations in hindering actual code generation for introductory programming assignments.
- Score: 42.49889252988544
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While Large language model (LLM)-based programming assistants such as CoPilot and ChatGPT can help improve the productivity of professional software developers, they can also facilitate cheating in introductory computer programming courses. Assuming instructors have limited control over the industrial-strength models, this paper investigates the baseline performance of 5 widely used LLMs on a collection of introductory programming problems, examines adversarial perturbations to degrade their performance, and describes the results of a user study aimed at understanding the efficacy of such perturbations in hindering actual code generation for introductory programming assignments. The user study suggests that i) perturbations combinedly reduced the average correctness score by 77%, ii) the drop in correctness caused by these perturbations was affected based on their detectability.
Related papers
- Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning [13.082135438792475]
Chain of Self-Correction embeds self-correction as an inherent ability in Large Language Models (LLMs)
CoSC operates through a sequence of self-correction stages. In each stage, the LLMs generate a program to address a given problem, execute this program using program-based tools to obtain an output, subsequently verify this output.
In the first phase, the LLMs are trained with a relatively small volume of seeding data generated from GPT-4, establishing an initial CoSC capability.
In the second phase, the CoSC capability is further enhanced by training with a larger volume of self-generated data using
arXiv Detail & Related papers (2024-10-14T17:16:44Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection [9.652886240532741]
This paper thoroughly analyses large language models' capabilities in detecting vulnerabilities within source code.
We evaluate the performance of six open-source models that are specifically trained for vulnerability detection against six general-purpose LLMs.
arXiv Detail & Related papers (2024-08-29T10:00:57Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement [93.38736019287224]
"LLMs-as-Instructors" framework autonomously enhances the training of smaller target models.
Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model.
Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors.
arXiv Detail & Related papers (2024-06-29T17:16:04Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts [1.7095867620640115]
A key aspect of programming education is understanding and dealing with error message.
'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler.
We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
arXiv Detail & Related papers (2024-04-30T08:03:22Z) - Rethinking the Roles of Large Language Models in Chinese Grammatical
Error Correction [62.409807640887834]
Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences.
LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus.
We rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC.
arXiv Detail & Related papers (2024-02-18T01:40:34Z) - Testing LLMs on Code Generation with Varying Levels of Prompt
Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing.
The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z) - How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging [28.321080454393687]
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds.
Human novices play the role of Teaching Assistants and help LLM-powered teachable agents code.
We introduce Hypo, a novel system to facilitate deliberate practice on debug, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents code.
arXiv Detail & Related papers (2023-10-08T21:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.