Adversarial Robustness of Neural-Statistical Features in Detection of
Generative Transformers
- URL: http://arxiv.org/abs/2203.07983v1
- Date: Wed, 2 Mar 2022 16:46:39 GMT
- Title: Adversarial Robustness of Neural-Statistical Features in Detection of
Generative Transformers
- Authors: Evan Crothers, Nathalie Japkowicz, Herna Viktor, Paula Branco
- Abstract summary: We evaluate neural and non-neural approaches on their ability to detect computer-generated text.
We find that while statistical features underperform neural features, statistical features provide additional adversarial robustness.
We pioneer the usage of $Delta$MAUVE as a proxy measure for human judgement of adversarial text quality.
- Score: 6.209131728799896
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The detection of computer-generated text is an area of rapidly increasing
significance as nascent generative models allow for efficient creation of
compelling human-like text, which may be abused for the purposes of spam,
disinformation, phishing, or online influence campaigns. Past work has studied
detection of current state-of-the-art models, but despite a developing threat
landscape, there has been minimal analysis of the robustness of detection
methods to adversarial attacks. To this end, we evaluate neural and non-neural
approaches on their ability to detect computer-generated text, their robustness
against text adversarial attacks, and the impact that successful adversarial
attacks have on human judgement of text quality. We find that while statistical
features underperform neural features, statistical features provide additional
adversarial robustness that can be leveraged in ensemble detection models. In
the process, we find that previously effective complex phrasal features for
detection of computer-generated text hold little predictive power against
contemporary generative models, and identify promising statistical features to
use instead. Finally, we pioneer the usage of $\Delta$MAUVE as a proxy measure
for human judgement of adversarial text quality.
Related papers
- Suspiciousness of Adversarial Texts to Human [3.312665722657581]
This study delves into the concept of human suspiciousness, a quality distinct from the traditional focus on imperceptibility found in image-based adversarial examples.
We gather and publish a novel dataset of Likert-scale human evaluations on the suspiciousness of adversarial sentences.
We develop a regression-based model to quantify suspiciousness and establish a baseline for future research in reducing the suspiciousness in adversarial text generation.
arXiv Detail & Related papers (2024-10-06T06:57:22Z) - Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack [24.954755569786396]
We propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection.
We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness.
The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content.
arXiv Detail & Related papers (2024-04-02T12:49:22Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - How do humans perceive adversarial text? A reality check on the validity
and naturalness of word-based adversarial attacks [4.297786261992324]
adversarial attacks are malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions.
We surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods.
Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved.
arXiv Detail & Related papers (2023-05-24T21:52:13Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - Mutation-Based Adversarial Attacks on Neural Text Detectors [1.5101132008238316]
We propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors.
In such attacks, attackers have access to the original text and create mutation instances based on this original text.
arXiv Detail & Related papers (2023-02-11T22:08:32Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Evaluating Deception Detection Model Robustness To Linguistic Variation [10.131671217810581]
We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection.
We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance.
We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
arXiv Detail & Related papers (2021-04-23T17:25:38Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.