Related papers: A Study on the Vulnerability of Test Questions against ChatGPT-based Cheating

A Study on the Vulnerability of Test Questions against ChatGPT-based Cheating

URL: http://arxiv.org/abs/2402.14881v1
Date: Wed, 21 Feb 2024 23:51:06 GMT
Title: A Study on the Vulnerability of Test Questions against ChatGPT-based Cheating
Authors: Shanker Ram and Chen Qian
Abstract summary: ChatGPT can answer text prompts fairly accurately, even performing very well on postgraduate-level questions. Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating.
Score: 14.113742357609285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: ChatGPT is a chatbot that can answer text prompts fairly accurately, even performing very well on postgraduate-level questions. Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating because students may directly use answers provided by tools like ChatGPT. In this paper, we try to provide an answer to an important question: how well ChatGPT can answer test questions and how we can detect whether the questions of a test can be answered correctly by ChatGPT. We generated ChatGPT's responses to the MedMCQA dataset, which contains over 10,000 medical school entrance exam questions. We analyzed the responses and uncovered certain types of questions ChatGPT answers more inaccurately than others. In addition, we have created a basic natural language processing model to single out the most vulnerable questions to ChatGPT in a collection of questions or a sample exam. Our tool can be used by test-makers to avoid ChatGPT-vulnerable test questions.

Related papers

Can ChatGPT Pass a Theory of Computing Course? [0.22940141855172028]
We evaluate ChatGPT's ability to pass our own ToC course's exams. We create a database of sample ToC questions and responses to accommodate other ToC offerings' choices for topics and structure.
arXiv Detail & Related papers (2024-07-10T15:34:06Z)
ChatGPT Incorrectness Detection in Software Reviews [0.38233569758620056]
We developed a tool called CID (ChatGPT Incorrectness Detector) to automatically test and detect the incorrectness in ChatGPT responses. In a benchmark study of library selection, we show that CID can detect incorrect responses from ChatGPT with an F1-score of 0.74 - 0.75.
arXiv Detail & Related papers (2024-03-25T00:50:27Z)
Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z)
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions [7.065853028825656]
We conducted the first in-depth analysis of ChatGPT answers to programming questions on Stack Overflow. We examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT answers. Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose.
arXiv Detail & Related papers (2023-08-04T13:23:20Z)
Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard [68.8204255655161]
We use 30 questions that are clear, without any ambiguities, fully described with plain text only, and have a unique, well defined correct answer. The answers are recorded and discussed, highlighting their strengths and weaknesses. It was found that ChatGPT-4 outperforms ChatGPT-3.5 in both sets of questions.
arXiv Detail & Related papers (2023-05-30T11:18:05Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models [49.52083248451775]
Large language models (LLMs) have made significant progress in NLP. We specifically focus on ChatGPT, a widely used and easily accessible LLM. We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities.
arXiv Detail & Related papers (2023-03-29T03:05:43Z)
Analyzing ChatGPT's Aptitude in an Introductory Computer Engineering Course [6.531546527140474]
ChatGPT is a tool that is able to generate plausible and human-sounding text answers to various questions. This work assesses ChatGPT's aptitude in answering quizzes, homework, exam, and laboratory questions in an introductory computer engineering course.
arXiv Detail & Related papers (2023-03-13T16:22:43Z)
Let's have a chat! A Conversation with ChatGPT: Technology, Applications, and Limitations [0.0]
Chat Generative Pre-trained Transformer, better known as ChatGPT, can generate human-like sentences and write coherent essays. Potential applications of ChatGPT in various domains, including healthcare, education, and research, are highlighted. Despite promising results, there are several privacy and ethical concerns surrounding ChatGPT.
arXiv Detail & Related papers (2023-02-27T14:26:29Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.