Related papers: From Hero to Z\'eroe: A Benchmark of Low-Level Adversarial Attacks

From Hero to Z\'eroe: A Benchmark of Low-Level Adversarial Attacks

URL: http://arxiv.org/abs/2010.05648v2
Date: Wed, 28 Oct 2020 12:53:05 GMT
Title: From Hero to Z\'eroe: A Benchmark of Low-Level Adversarial Attacks
Authors: Steffen Eger and Yannik Benz
Abstract summary: We propose the first large-scale catalogue and benchmark of low-level adversarial attacks. We show that RoBERTa, NLP's current workhorse, fails on our attacks. Our dataset provides a benchmark for testing robustness of future more human-like NLP models.
Score: 23.381986209234157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial attacks are label-preserving modifications to inputs of machine learning classifiers designed to fool machines but not humans. Natural Language Processing (NLP) has mostly focused on high-level attack scenarios such as paraphrasing input texts. We argue that these are less realistic in typical application scenarios such as in social media, and instead focus on low-level attacks on the character-level. Guided by human cognitive abilities and human robustness, we propose the first large-scale catalogue and benchmark of low-level adversarial attacks, which we dub Z\'eroe, encompassing nine different attack modes including visual and phonetic adversaries. We show that RoBERTa, NLP's current workhorse, fails on our attacks. Our dataset provides a benchmark for testing robustness of future more human-like NLP models.

Related papers

Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models. We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z)
Injecting Bias into Text Classification Models using Backdoor Attacks [0.0]
We propose to utilize backdoor attacks for a new purpose: bias injection. We develop a backdoor attack in which a subset of the training dataset is poisoned to associate strong male actors with negative sentiment. Our results show that the reduction in backdoored models' benign classification accuracy is limited.
arXiv Detail & Related papers (2024-12-25T19:32:02Z)
Revisiting Character-level Adversarial Attacks for Language Models [53.446619686108754]
We introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) Our method successfully targets both small (BERT) and large (Llama 2) models.
arXiv Detail & Related papers (2024-05-07T14:23:22Z)
Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks [62.34019142949628]
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP. We introduce two novel and more effective textitSelf-Generated attacks which prompt the LVLM to generate an attack against itself. Using our benchmark, we uncover that Self-Generated attacks pose a significant threat, reducing LVLM(s) classification performance by up to 33%.
arXiv Detail & Related papers (2024-02-01T14:41:20Z)
Fooling the Textual Fooler via Randomizing Latent Representations [13.77424820701913]
adversarial word-level perturbations are well-studied and effective attack strategies. We propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example. We empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks.
arXiv Detail & Related papers (2023-10-02T06:57:25Z)
Two-in-One: A Model Hijacking Attack Against Text Generation Models [19.826236952700256]
We propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones. Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.
arXiv Detail & Related papers (2023-05-12T12:13:27Z)
MultiRobustBench: Benchmarking Robustness Against Multiple Attacks [86.70417016955459]
We present the first unified framework for considering multiple attacks against machine learning (ML) models. Our framework is able to model different levels of learner's knowledge about the test-time adversary. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types.
arXiv Detail & Related papers (2023-02-21T20:26:39Z)
Adversarial Text Normalization [2.9434930072968584]
Adversarial Text Normalizer restores baseline performance on attacked content with low computational overhead. We find that text normalization provides a task-agnostic defense against character-level attacks.
arXiv Detail & Related papers (2022-06-08T19:44:03Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks [10.290050493635343]
Adversarial attacks expose important blind spots of deep learning systems. Character-level attacks typically insert typos into the input stream. We show that an untrained iterative approach can perform on par with human crowd-workers supervised via 3-shot learning.
arXiv Detail & Related papers (2021-06-02T20:21:03Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Natural Backdoor Attack on Text Data [15.35163515187413]
In this paper, we propose the textitbackdoor attacks on NLP models. We exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases. The results show the excellent performance of with 100% backdoor attacks success rate and sacrificing of 0.83% on the text classification task.
arXiv Detail & Related papers (2020-06-29T16:40:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.