Detecting LLM-Generated Text in Computing Education: A Comparative Study
for ChatGPT Cases
- URL: http://arxiv.org/abs/2307.07411v1
- Date: Mon, 10 Jul 2023 12:18:34 GMT
- Title: Detecting LLM-Generated Text in Computing Education: A Comparative Study
for ChatGPT Cases
- Authors: Michael Sheinman Orenstrakh, Oscar Karnalim, Carlos Anibal Suarez,
Michael Liut
- Abstract summary: Large Language Models (LLMs) have posed a serious threat to academic integrity in education.
Modern detectors are still in need of improvements so that they can offer a full-proof solution to help maintain academic integrity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the recent improvements and wide availability of Large Language Models
(LLMs), they have posed a serious threat to academic integrity in education.
Modern LLM-generated text detectors attempt to combat the problem by offering
educators with services to assess whether some text is LLM-generated. In this
work, we have collected 124 submissions from computer science students before
the creation of ChatGPT. We then generated 40 ChatGPT submissions. We used this
data to evaluate eight publicly-available LLM-generated text detectors through
the measures of accuracy, false positives, and resilience. The purpose of this
work is to inform the community of what LLM-generated text detectors work and
which do not, but also to provide insights for educators to better maintain
academic integrity in their courses. Our results find that CopyLeaks is the
most accurate LLM-generated text detector, GPTKit is the best LLM-generated
text detector to reduce false positives, and GLTR is the most resilient
LLM-generated text detector. We also express concerns over 52 false positives
(of 114 human written submissions) generated by GPTZero. Finally, we note that
all LLM-generated text detectors are less accurate with code, other languages
(aside from English), and after the use of paraphrasing tools (like QuillBot).
Modern detectors are still in need of improvements so that they can offer a
full-proof solution to help maintain academic integrity. Further, their
usability can be improved by facilitating a smooth API integration, providing
clear documentation of their features and the understandability of their
model(s), and supporting more commonly used languages.
Related papers
- Learning to Rewrite: Generalized LLM-Generated Text Detection [19.9477991969521]
Large language models (LLMs) can be abused at scale to create non-factual content and spread disinformation.
We propose training an LLM to rewrite input text, producing minimal edits for LLM-generated content and more edits for human-written text.
Our work suggests that LLM can effectively detect machine-generated text if they are trained properly.
arXiv Detail & Related papers (2024-08-08T05:53:39Z) - CUDRT: Benchmarking the Detection Models of Human vs. Large Language Models Generated Texts [9.682499180341273]
Large language models (LLMs) have greatly enhanced text generation across industries.
Their human-like outputs make distinguishing between human and AI authorship challenging.
Current benchmarks mainly rely on static datasets, limiting their effectiveness in assessing model-based detectors.
arXiv Detail & Related papers (2024-06-13T12:43:40Z) - ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts.
In this paper, we identify the common characteristics shared by these models.
We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text [98.28130949052313]
A score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text.
We propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs.
The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
arXiv Detail & Related papers (2024-01-22T16:09:47Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions [39.36381851190369]
There is an imperative need to develop detectors that can detect LLM-generated text.
This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content.
The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, statistics-based detectors, neural-base detectors, and human-assisted methods.
arXiv Detail & Related papers (2023-10-23T09:01:13Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - LLMDet: A Third Party Large Language Models Generated Text Detection
Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text.
Existing detection tools can only differentiate between machine-generated and human-authored text.
We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.