ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation
- URL: http://arxiv.org/abs/2210.12396v1
- Date: Sat, 22 Oct 2022 09:11:12 GMT
- Title: ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation
- Authors: Fan Yin, Yao Li, Cho-Jui Hsieh, Kai-Wei Chang
- Abstract summary: Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
- Score: 125.52743832477404
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial Examples Detection (AED) is a crucial defense technique against
adversarial attacks and has drawn increasing attention from the Natural
Language Processing (NLP) community. Despite the surge of new AED methods, our
studies show that existing methods heavily rely on a shortcut to achieve good
performance. In other words, current search-based adversarial attacks in NLP
stop once model predictions change, and thus most adversarial examples
generated by those attacks are located near model decision boundaries. To
surpass this shortcut and fairly evaluate AED methods, we propose to test AED
methods with \textbf{F}ar \textbf{B}oundary (\textbf{FB}) adversarial examples.
Existing methods show worse than random guess performance under this scenario.
To overcome this limitation, we propose a new technique, \textbf{ADDMU},
\textbf{a}dversary \textbf{d}etection with \textbf{d}ata and \textbf{m}odel
\textbf{u}ncertainty, which combines two types of uncertainty estimation for
both regular and FB adversarial example detection. Our new method outperforms
previous methods by 3.6 and 6.0 \emph{AUC} points under each scenario. Finally,
our analysis shows that the two types of uncertainty provided by \textbf{ADDMU}
can be leveraged to characterize adversarial examples and identify the ones
that contribute most to model's robustness in adversarial training.
Related papers
- Proximal Causal Inference With Text Data [5.796482272333648]
We propose a new causal inference method that uses two instances of pre-treatment text data, infers two proxies using two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula.
We evaluate our method in synthetic and semi-synthetic settings with real-world clinical notes from MIMIC-III and open large language models for zero-shot prediction.
arXiv Detail & Related papers (2024-01-12T16:51:02Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - A Unified Wasserstein Distributional Robustness Framework for
Adversarial Training [24.411703133156394]
This paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods.
We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework.
This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms.
arXiv Detail & Related papers (2022-02-27T19:40:29Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Random Projections for Adversarial Attack Detection [8.684378639046644]
adversarial attack detection remains a fundamentally challenging problem from two perspectives.
We present a technique that makes use of special properties of random projections, whereby we can characterize the behavior of clean and adversarial examples.
Performance evaluation demonstrates that our technique outperforms ($>0.92$ AUC) competing state of the art (SOTA) attack strategies.
arXiv Detail & Related papers (2020-12-11T15:02:28Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.