Related papers: Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

URL: http://arxiv.org/abs/2303.13408v2
Date: Wed, 18 Oct 2023 02:29:55 GMT
Title: Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
Authors: Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer
Abstract summary: We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking. We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
Score: 56.077252790310176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics. To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings while only classifying 1% of human-written sequences as AI-generated. We open-source our models, code and data.

Related papers

Evaluating the Performance of AI Text Detectors, Few-Shot and Chain-of-Thought Prompting Using DeepSeek Generated Text [2.942616054218564]
Adrialversa attacks, such as standard and humanized paraphrasing, inhibit detectors' ability to detect text.<n>We investigate whether six generally accessible AI Text, Content Detector AI, Copyleaks, QuillBot, GPT-2, and GPTZero can consistently recognize text generated by DeepSeek.
arXiv Detail & Related papers (2025-07-23T21:26:33Z)
AuthorMist: Evading AI Text Detectors with Reinforcement Learning [4.806579822134391]
AuthorMist is a novel reinforcement learning-based system to transform AI-generated text into human-like writing. We show that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning.
arXiv Detail & Related papers (2025-03-10T12:41:05Z)
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [60.09665704993751]
We introduce FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors.<n>Our framework paves the way for more robust classification in AI-generated content detection via post-processing.
arXiv Detail & Related papers (2025-02-06T21:58:48Z)
Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings [14.150011713654331]
This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT. The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts.
arXiv Detail & Related papers (2025-01-31T10:06:27Z)
Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination [1.8418334324753884]
This paper introduces back-translation as a novel technique for evading detection. We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text. We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems.
arXiv Detail & Related papers (2024-09-22T01:13:22Z)
SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs [0.0]
Homoglyph-based attacks can effectively circumvent state-of-the-art AI-generated text detectors. Our findings demonstrate that homoglyph-based attacks can effectively circumvent state-of-the-art detectors.
arXiv Detail & Related papers (2024-06-17T06:07:32Z)
Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD) PTD aims to identify paraphrased text spans within a text. We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z)
DetectGPT-SC: Improving Detection of Text Generated by Large Language Models through Self-Consistency with Masked Predictions [13.077729125193434]
Existing detectors are built on the assumption that there is a distribution gap between human-generated and AI-generated texts. We find that large language models such as ChatGPT exhibit strong self-consistency in text generation and continuation. We propose a new method for AI-generated texts detection based on self-consistency with masked predictions.
arXiv Detail & Related papers (2023-10-23T01:23:10Z)
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context. Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z)
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level [4.250876580245865]
Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques. We generate multiple paraphrased versions of a given question and inputting them into the large language model to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response.
arXiv Detail & Related papers (2023-06-13T20:34:55Z)
Multiscale Positive-Unlabeled Detection of AI-Generated Texts [27.956604193427772]
Multiscale Positive-Unlabeled (MPU) training framework is proposed to address the difficulty of short-text detection. MPU method augments detection performance on long AI-generated texts, and significantly improves short-text detection of language model detectors.
arXiv Detail & Related papers (2023-05-29T15:25:00Z)
Smaller Language Models are Better Black-box Machine-Generated Text Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors. We find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z)
Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques. In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.