Assessing the Prevalence of AI-assisted Cheating in Programming Courses: A Pilot Study
- URL: http://arxiv.org/abs/2507.06438v1
- Date: Tue, 08 Jul 2025 22:40:44 GMT
- Title: Assessing the Prevalence of AI-assisted Cheating in Programming Courses: A Pilot Study
- Authors: Kaléu Delphino,
- Abstract summary: Tools that can generate computer code in response to inputs written in natural language pose an existential threat to Computer Science education.<n>We conducted a pilot study in a large Computer Science class to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tools that can generate computer code in response to inputs written in natural language, such as ChatGPT, pose an existential threat to Computer Science education in its current form, since students can now use these tools to solve assignments without much effort. While that risk has already been recognized by scholars, the proportion of the student body that is incurring in this new kind of plagiarism is still an open problem. We conducted a pilot study in a large CS class (n=120) to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews. More than 25% of the survey respondents admitted to committing AI plagiarism. Conversely, only one student accepted to be interviewed. Given the high levels of misconduct acknowledgment, we conclude that surveys are an effective method for studies on the matter, while interviews should be avoided or designed in a way that can entice participation.
Related papers
- Ensuring Computer Science Learning in the AI Era: Open Generative AI Policies and Assignment-Driven Written Quizzes [0.0]
This paper presents an assessment model that permits the use of generative AI for take-home programming assignments.<n>To promote authentic learning, in-class, closed-book assessments are weighted more heavily than the assignments themselves.<n> Statistical analyses revealed no meaningful linear correlation between GenAI usage levels and assessment outcomes.
arXiv Detail & Related papers (2026-01-16T17:02:44Z) - CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z) - Identity Theft in AI Conference Peer Review [50.18240135317708]
We discuss newly uncovered cases of identity theft in the scientific peer-review process within artificial intelligence (AI) research.<n>We detail how dishonest researchers exploit the peer-review system by creating fraudulent reviewer profiles to manipulate paper evaluations.
arXiv Detail & Related papers (2025-08-06T02:36:52Z) - The Failure of Plagiarism Detection in Competitive Programming [0.0]
Plagiarism in programming courses remains a persistent challenge.<n>This paper examines why traditional code plagiarism detection methods frequently fail in competitive programming contexts.<n>We find that widely-used automated similarity checkers can be thwarted by simple code transformations or novel AI-generated code.
arXiv Detail & Related papers (2025-05-13T05:43:49Z) - PyEvalAI: AI-assisted evaluation of Jupyter Notebooks for immediate personalized feedback [43.56788158589046]
PyEvalAI scores Jupyter notebooks using a combination of unit tests and a locally hosted language model to preserve privacy.<n>A case study demonstrates its effectiveness in improving feedback speed and grading efficiency for exercises in a university-level course on numerics.
arXiv Detail & Related papers (2025-02-25T18:20:20Z) - Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [176.39275404745098]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions.<n>GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions.<n>Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z) - Influence of Personality Traits on Plagiarism Through Collusion in Programming Assignments [0.0]
We study how the Big-five personality traits affect the propensity for plagiarism in two take-home programming assignments.
Our results show that the extraversion trait of the Big Five personality exhibits a positive association, and the conscientiousness trait exhibits a negative association with plagiarism tendencies.
arXiv Detail & Related papers (2024-06-29T10:26:48Z) - PaperCard for Reporting Machine Assistance in Academic Writing [48.33722012818687]
ChatGPT, a question-answering system released by OpenAI in November 2022, has demonstrated a range of capabilities that could be utilised in producing academic papers.
This raises critical questions surrounding the concept of authorship in academia.
We propose a framework we name "PaperCard", a documentation for human authors to transparently declare the use of AI in their writing process.
arXiv Detail & Related papers (2023-10-07T14:28:04Z) - A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023.
We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance.
This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z) - Perception, performance, and detectability of conversational artificial
intelligence across 32 university courses [15.642614735026106]
We compare the performance of ChatGPT against students on 32 university-level courses.
We find that ChatGPT's performance is comparable, if not superior, to that of students in many courses.
We find an emerging consensus among students to use the tool, and among educators to treat this as plagiarism.
arXiv Detail & Related papers (2023-05-07T10:37:51Z) - ChatGPT: The End of Online Exam Integrity? [0.0]
This study evaluated the ability of ChatGPT, a recently developed artificial intelligence (AI) agent, to perform high-level cognitive tasks.
It raises concerns about the potential use of ChatGPT as a tool for academic misconduct in online exams.
arXiv Detail & Related papers (2022-12-19T08:15:16Z) - Giving Feedback on Interactive Student Programs with Meta-Exploration [74.5597783609281]
Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science.
Standard approaches require instructors to manually grade student-implemented interactive programs.
Online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs.
arXiv Detail & Related papers (2022-11-16T10:00:23Z) - Neural Language Models are Effective Plagiarists [38.85940137464184]
We find that a student using GPT-J can complete introductory level programming assignments without triggering suspicion from MOSS.
GPT-J was not trained on the problems in question and is not provided with any examples to work from.
We conclude that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code.
arXiv Detail & Related papers (2022-01-19T04:00:46Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.