The Impact of Prompts on Zero-Shot Detection of AI-Generated Text
- URL: http://arxiv.org/abs/2403.20127v1
- Date: Fri, 29 Mar 2024 11:33:34 GMT
- Title: The Impact of Prompts on Zero-Shot Detection of AI-Generated Text
- Authors: Kaito Taguchi, Yujie Gu, Kouichi Sakurai,
- Abstract summary: In chat-based applications, users commonly input prompts and utilize the AI-generated texts.
We introduce an evaluative framework to empirically analyze the impact of prompts on the detection accuracy of AI-generated text.
- Score: 4.337364406035291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there have been significant advancements in the development of Large Language Models (LLMs). While their practical applications are now widespread, their potential for misuse, such as generating fake news and committing plagiarism, has posed significant concerns. To address this issue, detectors have been developed to evaluate whether a given text is human-generated or AI-generated. Among others, zero-shot detectors stand out as effective approaches that do not require additional training data and are often likelihood-based. In chat-based applications, users commonly input prompts and utilize the AI-generated texts. However, zero-shot detectors typically analyze these texts in isolation, neglecting the impact of the original prompts. It is conceivable that this approach may lead to a discrepancy in likelihood assessments between the text generation phase and the detection phase. So far, there remains an unverified gap concerning how the presence or absence of prompts impacts detection accuracy for zero-shot detectors. In this paper, we introduce an evaluative framework to empirically analyze the impact of prompts on the detection accuracy of AI-generated text. We assess various zero-shot detectors using both white-box detection, which leverages the prompt, and black-box detection, which operates without prompt information. Our experiments reveal the significant influence of prompts on detection accuracy. Remarkably, compared with black-box detection without prompts, the white-box methods using prompts demonstrate an increase in AUC of at least $0.1$ across all zero-shot detectors tested. Code is available: \url{https://github.com/kaito25atugich/Detector}.
Related papers
- ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.
We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.
We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z) - Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings [14.150011713654331]
This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT.
The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts.
arXiv Detail & Related papers (2025-01-31T10:06:27Z) - A Practical Examination of AI-Generated Text Detectors for Large Language Models [25.919278893876193]
Machine-generated content detectors claim to identify such text under various conditions and from any language model.
This paper critically evaluates these claims by assessing several popular detectors on a range of domains, datasets, and models that these detectors have not previously encountered.
arXiv Detail & Related papers (2024-12-06T15:56:11Z) - Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors [24.954755569786396]
AI-text detection has emerged to distinguish between human and machine-generated content.
Recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts.
Our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors.
arXiv Detail & Related papers (2024-06-13T08:37:01Z) - Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple but effective black-box zero-shot detection approach.
It is predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.
Our method achieves an average AUROC of 98.7% and shows strong robustness against paraphrase and adversarial perturbation attacks.
arXiv Detail & Related papers (2024-05-07T12:57:01Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability
Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function.
We then define a new curvature-based criterion for judging if a passage is generated from a given LLM.
We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.