DAMAGE: Detecting Adversarially Modified AI Generated Text
- URL: http://arxiv.org/abs/2501.03437v1
- Date: Mon, 06 Jan 2025 23:43:49 GMT
- Title: DAMAGE: Detecting Adversarially Modified AI Generated Text
- Authors: Elyas Masrour, Bradley Emi, Max Spero,
- Abstract summary: We show that many existing AI detectors fail to detect humanized text.
We demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate.
- Score: 0.13108652488669736
- License:
- Abstract: AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
Related papers
- People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text [37.36534911201806]
We hire annotators to read 300 non-fiction English articles and label them as either human-written or AI-generated.
Experiments show that annotators who frequently use LLMs for writing tasks excel at detecting AI-generated text.
We release our annotated dataset and code to spur future research into both human and automated detection of AI-generated text.
arXiv Detail & Related papers (2025-01-26T19:31:34Z) - ESPERANTO: Evaluating Synthesized Phrases to Enhance Robustness in AI Detection for Text Origination [1.8418334324753884]
This paper introduces back-translation as a novel technique for evading detection.
We present a model that combines these back-translated texts to produce a manipulated version of the original AI-generated text.
We evaluate this technique on nine AI detectors, including six open-source and three proprietary systems.
arXiv Detail & Related papers (2024-09-22T01:13:22Z) - Are AI-Generated Text Detectors Robust to Adversarial Perturbations? [9.001160538237372]
Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations.
This paper investigates the robustness of existing AIGT detection methods and introduces a novel detector, the Siamese Calibrated Reconstruction Network (SCRN)
The SCRN employs a reconstruction network to add and remove noise from text, extracting a semantic representation that is robust to local perturbations.
arXiv Detail & Related papers (2024-06-03T10:21:48Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.