MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification
- URL: http://arxiv.org/abs/2406.13066v1
- Date: Tue, 18 Jun 2024 21:27:13 GMT
- Title: MaskPure: Improving Defense Against Text Adversaries with Stochastic Purification
- Authors: Harrison Gietz, Jugal Kalita,
- Abstract summary: In computer vision settings, the noising and de-noising process has proven useful for purifying input images.
Some initial work has explored the use of random noising and de-noising to mitigate adversarial attacks in an NLP setting.
We extend upon methods of input purification text that are inspired by diffusion processes.
Our novel method, MaskPure, exceeds or matches robustness compared to other contemporary defenses.
- Score: 7.136205674624813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The improvement of language model robustness, including successful defense against adversarial attacks, remains an open problem. In computer vision settings, the stochastic noising and de-noising process provided by diffusion models has proven useful for purifying input images, thus improving model robustness against adversarial attacks. Similarly, some initial work has explored the use of random noising and de-noising to mitigate adversarial attacks in an NLP setting, but improving the quality and efficiency of these methods is necessary for them to remain competitive. We extend upon methods of input text purification that are inspired by diffusion processes, which randomly mask and refill portions of the input text before classification. Our novel method, MaskPure, exceeds or matches robustness compared to other contemporary defenses, while also requiring no adversarial classifier training and without assuming knowledge of the attack type. In addition, we show that MaskPure is provably certifiably robust. To our knowledge, MaskPure is the first stochastic-purification method with demonstrated success against both character-level and word-level attacks, indicating the generalizable and promising nature of stochastic denoising defenses. In summary: the MaskPure algorithm bridges literature on the current strongest certifiable and empirical adversarial defense methods, showing that both theoretical and practical robustness can be obtained together. Code is available on GitHub at https://github.com/hubarruby/MaskPure.
Related papers
- DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing [93.45507533317405]
DiffusionGuard is a robust and effective defense method against unauthorized edits by diffusion-based image editing models.
We introduce a novel objective that generates adversarial noise targeting the early stage of the diffusion process.
We also introduce a mask-augmentation technique to enhance robustness against various masks during test time.
arXiv Detail & Related papers (2024-10-08T05:19:19Z) - DiffuseDef: Improved Robustness to Adversarial Attacks [38.34642687239535]
adversarial attacks pose a critical challenge to system built using pretrained language models.
We propose DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier.
During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation.
arXiv Detail & Related papers (2024-06-28T22:36:17Z) - Improving Adversarial Robustness via Decoupled Visual Representation Masking [65.73203518658224]
In this paper, we highlight two novel properties of robust features from the feature distribution perspective.
We find that state-of-the-art defense methods aim to address both of these mentioned issues well.
Specifically, we propose a simple but effective defense based on decoupled visual representation masking.
arXiv Detail & Related papers (2024-06-16T13:29:41Z) - Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility.
Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z) - Adversarial Text Purification: A Large Language Model Approach for
Defense [25.041109219049442]
Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks.
We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models.
Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.
arXiv Detail & Related papers (2024-02-05T02:36:41Z) - Language Guided Adversarial Purification [3.9931474959554496]
Adversarial purification using generative models demonstrates strong adversarial defense performance.
New framework, Language Guided Adversarial Purification (LGAP), utilizing pre-trained diffusion models and caption generators.
arXiv Detail & Related papers (2023-09-19T06:17:18Z) - Text Adversarial Purification as Defense against Adversarial Attacks [46.80714732957078]
Adversarial purification is a successful defense mechanism against adversarial attacks.
We introduce a novel adversarial purification method that focuses on defending against textual adversarial attacks.
We test our proposed adversarial purification method on several strong adversarial attack methods including Textfooler and BERT-Attack.
arXiv Detail & Related papers (2022-03-27T04:41:55Z) - Stochastic Security: Adversarial Defense Using Long-Run Dynamics of
Energy-Based Models [82.03536496686763]
The vulnerability of deep networks to adversarial attacks is a central problem for deep learning from the perspective of both cognition and security.
We focus on defending naturally-trained classifiers using Markov Chain Monte Carlo (MCMC) sampling with an Energy-Based Model (EBM) for adversarial purification.
Our contributions are 1) an improved method for training EBM's with realistic long-run MCMC samples, 2) Expectation-Over-Transformation (EOT) defense that resolves theoretical ambiguities for defenses, and 3) state-of-the-art adversarial defense for naturally-trained classifiers and competitive defense.
arXiv Detail & Related papers (2020-05-27T17:53:36Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.