Related papers: DiffuseDef: Improved Robustness to Adversarial Attacks

DiffuseDef: Improved Robustness to Adversarial Attacks

URL: http://arxiv.org/abs/2407.00248v1
Date: Fri, 28 Jun 2024 22:36:17 GMT
Title: DiffuseDef: Improved Robustness to Adversarial Attacks
Authors: Zhenhao Li, Marek Rei, Lucia Specia,
Abstract summary: adversarial attacks pose a critical challenge to system built using pretrained language models. We propose DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier. During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation.
Score: 38.34642687239535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretrained language models have significantly advanced performance across various natural language processing tasks. However, adversarial attacks continue to pose a critical challenge to system built using these models, as they can be exploited with carefully crafted adversarial texts. Inspired by the ability of diffusion models to predict and reduce noise in computer vision, we propose a novel and flexible adversarial defense method for language classification tasks, DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier. During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation. By integrating adversarial training, denoising, and ensembling techniques, we show that DiffuseDef improves over different existing adversarial defense methods and achieves state-of-the-art performance against common adversarial attacks.

Related papers

VideoPure: Diffusion-based Adversarial Purification for Video Recognition [21.317424798634086]
We propose the first diffusion-based video purification framework to improve video recognition models' adversarial robustness: VideoPure.<n>We employ temporal DDIM inversion to transform the input distribution into a temporally consistent and trajectory-defined distribution, covering adversarial noise while preserving more video structure.<n>We investigate the defense performance of our method against black-box, gray-box, and adaptive attacks on benchmark datasets and models.
arXiv Detail & Related papers (2025-01-25T00:24:51Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification [7.136205674624813]
In computer vision settings, the noising and de-noising process has proven useful for purifying input images. Some initial work has explored the use of random noising and de-noising to mitigate adversarial attacks in an NLP setting. We extend upon methods of input purification text that are inspired by diffusion processes. Our novel method, MaskPure, exceeds or matches robustness compared to other contemporary defenses.
arXiv Detail & Related papers (2024-06-18T21:27:13Z)
Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z)
Adversarial Text Purification: A Large Language Model Approach for Defense [25.041109219049442]
Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks. We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models. Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.
arXiv Detail & Related papers (2024-02-05T02:36:41Z)
Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z)
Instruct2Attack: Language-Guided Semantic Adversarial Attacks [76.83548867066561]
Instruct2Attack (I2A) is a language-guided semantic attack that generates meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses.
arXiv Detail & Related papers (2023-11-27T05:35:49Z)
Language Guided Adversarial Purification [3.9931474959554496]
Adversarial purification using generative models demonstrates strong adversarial defense performance. New framework, Language Guided Adversarial Purification (LGAP), utilizing pre-trained diffusion models and caption generators.
arXiv Detail & Related papers (2023-09-19T06:17:18Z)
DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks [34.86098237949214]
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks. DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework.
arXiv Detail & Related papers (2023-06-15T13:33:27Z)
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z)
Text Adversarial Purification as Defense against Adversarial Attacks [46.80714732957078]
Adversarial purification is a successful defense mechanism against adversarial attacks. We introduce a novel adversarial purification method that focuses on defending against textual adversarial attacks. We test our proposed adversarial purification method on several strong adversarial attack methods including Textfooler and BERT-Attack.
arXiv Detail & Related papers (2022-03-27T04:41:55Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo. Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation. Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z)
Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model. We propose to exploit additional information from the feature space to craft stronger adversaries. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.