Towards Efficient and Domain-Agnostic Evasion Attack with
High-dimensional Categorical Inputs
- URL: http://arxiv.org/abs/2212.06836v1
- Date: Tue, 13 Dec 2022 18:45:00 GMT
- Title: Towards Efficient and Domain-Agnostic Evasion Attack with
High-dimensional Categorical Inputs
- Authors: Hongyan Bao, Yufei Han, Yujun Zhou, Xin Gao, Xiangliang Zhang
- Abstract summary: Our work targets at searching feasible adversarial to attack a perturbation with high-dimensional categorical inputs in a domain-agnostic setting.
Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming.
Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
- Score: 33.36532022853583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our work targets at searching feasible adversarial perturbation to attack a
classifier with high-dimensional categorical inputs in a domain-agnostic
setting. This is intrinsically an NP-hard knapsack problem where the
exploration space becomes explosively larger as the feature dimension
increases. Without the help of domain knowledge, solving this problem via
heuristic method, such as Branch-and-Bound, suffers from exponential
complexity, yet can bring arbitrarily bad attack results. We address the
challenge via the lens of multi-armed bandit based combinatorial search. Our
proposed method, namely FEAT, treats modifying each categorical feature as
pulling an arm in multi-armed bandit programming. Our objective is to achieve
highly efficient and effective attack using an Orthogonal Matching Pursuit
(OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our
theoretical analysis bounding the regret gap of FEAT guarantees its practical
attack performance. In empirical analysis, we compare FEAT with other
state-of-the-art domain-agnostic attack methods over various real-world
categorical data sets of different applications. Substantial experimental
observations confirm the expected efficiency and attack effectiveness of FEAT
applied in different application scenarios. Our work further hints the
applicability of FEAT for assessing the adversarial vulnerability of
classification systems with high-dimensional categorical inputs.
Related papers
- Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages.
We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Probabilistic Categorical Adversarial Attack & Adversarial Training [45.458028977108256]
The existence of adversarial examples brings huge concern for people to apply Deep Neural Networks (DNNs) in safety-critical tasks.
How to generate adversarial examples with categorical data is an important problem but lack of extensive exploration.
We propose Probabilistic Categorical Adversarial Attack (PCAA), which transfers the discrete optimization problem to a continuous problem that can be solved efficiently by Projected Gradient Descent.
arXiv Detail & Related papers (2022-10-17T19:04:16Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Unreasonable Effectiveness of Last Hidden Layer Activations [0.5156484100374058]
We show that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted attack cases.
We've experimentally verified the efficacy of our approach on MNIST (Digit), CIFAR10 datasets.
arXiv Detail & Related papers (2022-02-15T12:02:59Z) - EAD: an ensemble approach to detect adversarial examples from the hidden
features of deep neural networks [1.3212032015497979]
We propose an Ensemble Adversarial Detector (EAD) for the identification of adversarial examples.
EAD combines multiple detectors that exploit distinct properties of the input instances in the internal representation of a pre-trained Deep Neural Network (DNN)
We show that EAD achieves the best AUROC and AUPR in the large majority of the settings and comparable performance in the others.
arXiv Detail & Related papers (2021-11-24T17:05:26Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Universal Adversarial Perturbations for Malware [15.748648955898528]
Universal Adversarial Perturbations (UAPs) identify noisy patterns that generalize across the input space.
We explore the challenges and strengths of UAPs in the context of malware classification.
We propose adversarial training-based mitigations using knowledge derived from the problem-space transformations.
arXiv Detail & Related papers (2021-02-12T20:06:10Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.