Attacking Neural Text Detectors
- URL: http://arxiv.org/abs/2002.11768v4
- Date: Wed, 19 Jan 2022 09:28:08 GMT
- Title: Attacking Neural Text Detectors
- Authors: Max Wolff, Stuart Wolff
- Abstract summary: This paper presents two classes of black-box attacks on neural text detectors.
The homoglyph and misspelling attacks decrease a popular neural text detector's recall on neural text from 97.44% to 0.26% and 22.68%, respectively.
Results also indicate that the attacks are transferable to other neural text detectors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning based language models have recently made significant
progress, which introduces a danger to spread misinformation. To combat this
potential danger, several methods have been proposed for detecting text written
by these language models. This paper presents two classes of black-box attacks
on these detectors, one which randomly replaces characters with homoglyphs, and
the other a simple scheme to purposefully misspell words. The homoglyph and
misspelling attacks decrease a popular neural text detector's recall on neural
text from 97.44% to 0.26% and 22.68%, respectively. Results also indicate that
the attacks are transferable to other neural text detectors.
Related papers
- Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings [14.150011713654331]
This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT.
The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts.
arXiv Detail & Related papers (2025-01-31T10:06:27Z) - Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model [0.0]
We propose a multi-granularity Tibetan textual adversarial attack method based on masked language models called TSTricker.
Results show that TSTricker reduces the accuracy of the classification models by more than 28.70% and makes the classification models change the predictions of more than 90.60% of the samples.
arXiv Detail & Related papers (2024-12-03T10:03:52Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Efficient Black-Box Adversarial Attacks on Neural Text Detectors [1.223779595809275]
We investigate three simple strategies to alter texts generated by GPT-3.5 that are unsuspicious or unnoticeable for humans but cause misclassification by neural text detectors.
The results show that especially parameter tweaking and character-level mutations are effective strategies.
arXiv Detail & Related papers (2023-11-03T12:29:32Z) - Smaller Language Models are Better Black-box Machine-Generated Text
Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors.
We find that whether the detector and generator were trained on the same data is not critically important to the detection success.
For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Mutation-Based Adversarial Attacks on Neural Text Detectors [1.5101132008238316]
We propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors.
In such attacks, attackers have access to the original text and create mutation instances based on this original text.
arXiv Detail & Related papers (2023-02-11T22:08:32Z) - Putting words into the system's mouth: A targeted attack on neural
machine translation using monolingual data poisoning [50.67997309717586]
We propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation.
This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation.
We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack.
arXiv Detail & Related papers (2021-07-12T08:07:09Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.