Evaluating Deception Detection Model Robustness To Linguistic Variation
- URL: http://arxiv.org/abs/2104.11729v1
- Date: Fri, 23 Apr 2021 17:25:38 GMT
- Title: Evaluating Deception Detection Model Robustness To Linguistic Variation
- Authors: Maria Glenski, Ellyn Ayton, Robin Cosbey, Dustin Arendt, and Svitlana
Volkova
- Abstract summary: We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection.
We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance.
We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
- Score: 10.131671217810581
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the increasing use of machine-learning driven algorithmic judgements, it
is critical to develop models that are robust to evolving or manipulated
inputs. We propose an extensive analysis of model robustness against linguistic
variation in the setting of deceptive news detection, an important task in the
context of misinformation spread online. We consider two prediction tasks and
compare three state-of-the-art embeddings to highlight consistent trends in
model performance, high confidence misclassifications, and high impact
failures. By measuring the effectiveness of adversarial defense strategies and
evaluating model susceptibility to adversarial attacks using character- and
word-perturbed text, we find that character or mixed ensemble models are the
most effective defenses and that character perturbation-based attack tactics
are more successful.
Related papers
- Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment [0.23020018305241333]
This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts.
The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies.
arXiv Detail & Related papers (2024-07-01T20:25:20Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets [46.19529338280716]
Language models, characterized by their black-box nature, often hallucinate and display sensitivity to input perturbations.
We introduce a methodology designed to examine how input perturbations affect language models across various scales.
We present three distinct fine-tuning strategies to address robustness against multiple perturbations.
arXiv Detail & Related papers (2023-11-15T02:59:10Z) - Machine Translation Models Stand Strong in the Face of Adversarial
Attacks [2.6862667248315386]
Our research focuses on the impact of adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models.
We introduce algorithms that incorporate basic text perturbations and more advanced strategies, such as the gradient-based attack.
arXiv Detail & Related papers (2023-09-10T11:22:59Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Interpretations Cannot Be Trusted: Stealthy and Effective Adversarial
Perturbations against Interpretable Deep Learning [16.13790238416691]
This work introduces two attacks, AdvEdge and AdvEdge$+$, that deceive both the target deep learning model and the coupled interpretation model.
Our analysis shows the effectiveness of our attacks in terms of deceiving the deep learning models and their interpreters.
arXiv Detail & Related papers (2022-11-29T04:45:10Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs
and Adversarial Attacks [9.36331571226256]
We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level.
We develop a model to predict model errors during adversarial attacks.
arXiv Detail & Related papers (2020-05-01T03:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.