Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense
- URL: http://arxiv.org/abs/2303.13408v2
- Date: Wed, 18 Oct 2023 02:29:55 GMT
- Title: Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense
- Authors: Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit
Iyyer
- Abstract summary: We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
- Score: 56.077252790310176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise in malicious usage of large language models, such as fake content
creation and academic plagiarism, has motivated the development of approaches
that identify AI-generated text, including those based on watermarking or
outlier detection. However, the robustness of these detection algorithms to
paraphrases of AI-generated text remains unclear. To stress test these
detectors, we build a 11B parameter paraphrase generation model (DIPPER) that
can paraphrase paragraphs, condition on surrounding context, and control
lexical diversity and content reordering. Using DIPPER to paraphrase text
generated by three large language models (including GPT3.5-davinci-003)
successfully evades several detectors, including watermarking, GPTZero,
DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection
accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of
1%), without appreciably modifying the input semantics.
To increase the robustness of AI-generated text detection to paraphrase
attacks, we introduce a simple defense that relies on retrieving
semantically-similar generations and must be maintained by a language model API
provider. Given a candidate text, our algorithm searches a database of
sequences previously generated by the API, looking for sequences that match the
candidate text within a certain threshold. We empirically verify our defense
using a database of 15M generations from a fine-tuned T5-XXL model and find
that it can detect 80% to 97% of paraphrased generations across different
settings while only classifying 1% of human-written sequences as AI-generated.
We open-source our models, code and data.
Related papers
- Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings [14.150011713654331]
This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT.
The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts.
arXiv Detail & Related papers (2025-01-31T10:06:27Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination [1.8418334324753884]
This paper introduces back-translation as a novel technique for evading detection.
We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text.
We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems.
arXiv Detail & Related papers (2024-09-22T01:13:22Z) - SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs [0.0]
Homoglyph-based attacks can effectively circumvent state-of-the-art AI-generated text detectors.
Our findings demonstrate that homoglyph-based attacks can effectively circumvent state-of-the-art detectors.
arXiv Detail & Related papers (2024-06-17T06:07:32Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - DetectGPT-SC: Improving Detection of Text Generated by Large Language
Models through Self-Consistency with Masked Predictions [13.077729125193434]
Existing detectors are built on the assumption that there is a distribution gap between human-generated and AI-generated texts.
We find that large language models such as ChatGPT exhibit strong self-consistency in text generation and continuation.
We propose a new method for AI-generated texts detection based on self-consistency with masked predictions.
arXiv Detail & Related papers (2023-10-23T01:23:10Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Smaller Language Models are Better Black-box Machine-Generated Text
Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors.
We find that whether the detector and generator were trained on the same data is not critically important to the detection success.
For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.