Can AI-Generated Text be Reliably Detected?
- URL: http://arxiv.org/abs/2303.11156v3
- Date: Mon, 19 Feb 2024 16:34:24 GMT
- Title: Can AI-Generated Text be Reliably Detected?
- Authors: Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao
Wang and Soheil Feizi
- Abstract summary: Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
- Score: 54.670136179857344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The unregulated use of LLMs can potentially lead to malicious consequences
such as plagiarism, generating fake news, spamming, etc. Therefore, reliable
detection of AI-generated text can be critical to ensure the responsible use of
LLMs. Recent works attempt to tackle this problem either using certain model
signatures present in the generated text outputs or by applying watermarking
techniques that imprint specific patterns onto them. In this paper, we show
that these detectors are not reliable in practical scenarios. In particular, we
develop a recursive paraphrasing attack to apply on AI text, which can break a
whole range of detectors, including the ones using the watermarking schemes as
well as neural network-based detectors, zero-shot classifiers, and
retrieval-based detectors. Our experiments include passages around 300 tokens
in length, showing the sensitivity of the detectors even in the case of
relatively long passages. We also observe that our recursive paraphrasing only
degrades text quality slightly, measured via human studies, and metrics such as
perplexity scores and accuracy on text benchmarks. Additionally, we show that
even LLMs protected by watermarking schemes can be vulnerable against spoofing
attacks aimed to mislead detectors to classify human-written text as
AI-generated, potentially causing reputational damages to the developers. In
particular, we show that an adversary can infer hidden AI text signatures of
the LLM outputs without having white-box access to the detection method.
Finally, we provide a theoretical connection between the AUROC of the best
possible detector and the Total Variation distance between human and AI text
distributions that can be used to study the fundamental hardness of the
reliable detection problem for advanced language models. Our code is publicly
available at https://github.com/vinusankars/Reliability-of-AI-text-detectors.
Related papers
- Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings [14.150011713654331]
This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT.
The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts.
arXiv Detail & Related papers (2025-01-31T10:06:27Z) - DAMAGE: Detecting Adversarially Modified AI Generated Text [0.13108652488669736]
We show that many existing AI detectors fail to detect humanized text.
We demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate.
arXiv Detail & Related papers (2025-01-06T23:43:49Z) - MOSAIC: Multiple Observers Spotting AI Content, a Robust Approach to Machine-Generated Text Detection [35.67613230687864]
Large Language Models (LLMs) are trained at scale and endowed with powerful text-generating abilities.
Various proposals have been made to automatically discriminate artificially generated from human-written texts.
We derive a new, theoretically grounded approach to combine their respective strengths.
Our experiments, using a variety of generator LLMs, suggest that our method effectively leads to robust detection performances.
arXiv Detail & Related papers (2024-09-11T20:55:12Z) - SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs [0.0]
Homoglyph-based attacks can effectively circumvent state-of-the-art AI-generated text detectors.
Our findings demonstrate that homoglyph-based attacks can effectively circumvent state-of-the-art detectors.
arXiv Detail & Related papers (2024-06-17T06:07:32Z) - The Impact of Prompts on Zero-Shot Detection of AI-Generated Text [4.337364406035291]
In chat-based applications, users commonly input prompts and utilize the AI-generated texts.
We introduce an evaluative framework to empirically analyze the impact of prompts on the detection accuracy of AI-generated text.
arXiv Detail & Related papers (2024-03-29T11:33:34Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.