Evade ChatGPT Detectors via A Single Space
- URL: http://arxiv.org/abs/2307.02599v2
- Date: Fri, 13 Oct 2023 17:01:11 GMT
- Title: Evade ChatGPT Detectors via A Single Space
- Authors: Shuyang Cai and Wanyun Cui
- Abstract summary: Existing detectors are built upon the assumption that there are distributional gaps between human-generated and AI-generated text.
We find that detectors do not effectively discriminate the semantic and stylistic gaps between human-generated and AI-generated text.
We propose the SpaceInfi strategy to evade detection.
- Score: 17.07852413707166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ChatGPT brings revolutionary social value but also raises concerns about the
misuse of AI-generated text. Consequently, an important question is how to
detect whether texts are generated by ChatGPT or by human. Existing detectors
are built upon the assumption that there are distributional gaps between
human-generated and AI-generated text. These gaps are typically identified
using statistical information or classifiers. Our research challenges the
distributional gap assumption in detectors. We find that detectors do not
effectively discriminate the semantic and stylistic gaps between
human-generated and AI-generated text. Instead, the "subtle differences", such
as an extra space, become crucial for detection. Based on this discovery, we
propose the SpaceInfi strategy to evade detection. Experiments demonstrate the
effectiveness of this strategy across multiple benchmarks and detectors. We
also provide a theoretical explanation for why SpaceInfi is successful in
evading perplexity-based detection. And we empirically show that a phenomenon
called token mutation causes the evasion for language model-based detectors.
Our findings offer new insights and challenges for understanding and
constructing more applicable ChatGPT detectors.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors [24.954755569786396]
AI-text detection has emerged to distinguish between human and machine-generated content.
Recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts.
Our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors.
arXiv Detail & Related papers (2024-06-13T08:37:01Z) - The Impact of Prompts on Zero-Shot Detection of AI-Generated Text [4.337364406035291]
In chat-based applications, users commonly input prompts and utilize the AI-generated texts.
We introduce an evaluative framework to empirically analyze the impact of prompts on the detection accuracy of AI-generated text.
arXiv Detail & Related papers (2024-03-29T11:33:34Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - DetectGPT-SC: Improving Detection of Text Generated by Large Language
Models through Self-Consistency with Masked Predictions [13.077729125193434]
Existing detectors are built on the assumption that there is a distribution gap between human-generated and AI-generated texts.
We find that large language models such as ChatGPT exhibit strong self-consistency in text generation and continuation.
We propose a new method for AI-generated texts detection based on self-consistency with masked predictions.
arXiv Detail & Related papers (2023-10-23T01:23:10Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.