Related papers: Can Targeted Clean-Label Poisoning Attacks Generalize?

Can Targeted Clean-Label Poisoning Attacks Generalize?

URL: http://arxiv.org/abs/2412.03908v1
Date: Thu, 05 Dec 2024 06:27:14 GMT
Title: Can Targeted Clean-Label Poisoning Attacks Generalize?
Authors: Zhizhen Chen, Subrat Kishore Dutta, Zhengyu Zhao, Chenhao Lin, Chao Shen, Xiao Zhang,
Abstract summary: We study whether targeted poisoning attacks can generalize to unknown variations of those targets.<n>In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances.<n>Our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy.
Score: 11.499065606209925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Targeted poisoning attacks aim to compromise the model's prediction on specific target samples. In a common clean-label setting, they are achieved by slightly perturbing a subset of training samples given access to those specific targets. Despite continuous efforts, it remains unexplored whether such attacks can generalize to unknown variations of those targets. In this paper, we take the first step to systematically study this generalization problem. Observing that the widely adopted, cosine similarity-based attack exhibits limited generalizability, we propose a well-generalizable attack that leverages both the direction and magnitude of model gradients. In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances. Extensive experiments across various generalization scenarios demonstrate that our method consistently achieves the best attack effectiveness. For example, our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy, averaged over four models on two image benchmark datasets. The code is available at https://github.com/jiaangk/generalizable_tcpa

Related papers

A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection [9.335304254034401]
We introduce a lightweight, plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself.<n>Our method achieves state-of-the-art detection performance with negligible computational overhead and no compromise to clean accuracy.
arXiv Detail & Related papers (2025-05-19T00:48:53Z)
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class. Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples [26.37278338032268]
Adrial examples are typically optimized with gradient-based attacks. Each is shown to outperform its predecessors using different experimental setups. This provides overly-optimistic and even biased evaluations.
arXiv Detail & Related papers (2024-04-30T11:19:05Z)
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks.<n>We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors.<n>Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z)
Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation [120.42853706967188]
We explore the potential backdoor attacks on model adaptation launched by well-designed poisoning target data. We propose a plug-and-play method named MixAdapt, combining it with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z)
Transferable Attack for Semantic Segmentation [59.17710830038692]
adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models. We propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability.
arXiv Detail & Related papers (2023-07-31T11:05:55Z)
Improving Adversarial Transferability via Intermediate-level Perturbation Decay [79.07074710460012]
We develop a novel intermediate-level method that crafts adversarial examples within a single stage of optimization. Experimental results show that it outperforms state-of-the-arts by large margins in attacking various victim models.
arXiv Detail & Related papers (2023-04-26T09:49:55Z)
Decision-BADGE: Decision-based Adversarial Batch Attack with Directional Gradient Estimation [0.0]
Decision-BADGE is a novel method to craft universal adversarial perturbations for executing decision-based black-box attacks. Our proposed method shows a superior success rate with less training time. The research also shows that Decision-BADGE can successfully deceive unseen victim models and accurately target specific classes.
arXiv Detail & Related papers (2023-03-09T01:42:43Z)
GLOW: Global Layout Aware Attacks for Object Detection [27.46902978168904]
Adversarial attacks aim to perturb images such that a predictor outputs incorrect results. We present first approach that copes with various attack requests by generating global layout-aware adversarial attacks. In experiment, we design multiple types of attack requests and validate our ideas on MS validation set.
arXiv Detail & Related papers (2023-02-27T22:01:34Z)
Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack. New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z)
Not All Poisons are Created Equal: Robust Training against Data Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data. We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z)
Unreasonable Effectiveness of Last Hidden Layer Activations [0.5156484100374058]
We show that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted attack cases. We've experimentally verified the efficacy of our approach on MNIST (Digit), CIFAR10 datasets.
arXiv Detail & Related papers (2022-02-15T12:02:59Z)
Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation [11.663072799764542]
This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack. Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction.
arXiv Detail & Related papers (2022-01-25T02:36:34Z)
Adaptive Perturbation for Adversarial Attack [50.77612889697216]
We propose a new gradient-based attack method for adversarial examples. We use the exact gradient direction with a scaling factor for generating adversarial perturbations. Our method exhibits higher transferability and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T07:57:41Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Modeling Object Dissimilarity for Deep Saliency Prediction [86.14710352178967]
We introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects. Our approach is general, allowing us to fuse our object dissimilarities with features extracted by any deep saliency prediction network.
arXiv Detail & Related papers (2021-04-08T16:10:37Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Untargeted, Targeted and Universal Adversarial Attacks and Defenses on Time Series [0.0]
We have performed untargeted, targeted and universal adversarial attacks on UCR time series datasets. Our results show that deep learning based time series classification models are vulnerable to these attacks. We also show that universal adversarial attacks have good generalization property as it need only a fraction of the training data.
arXiv Detail & Related papers (2021-01-13T13:00:51Z)
Patch-wise++ Perturbation for Adversarial Targeted Attacks [132.58673733817838]
We propose a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability. Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $epsilon$-constraint is properly assigned to its surrounding regions. Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 35.9% for defense models and 32.7% for normally trained models.
arXiv Detail & Related papers (2020-12-31T08:40:42Z)
Dynamically Sampled Nonlocal Gradients for Stronger Adversarial Attacks [3.055601224691843]
The vulnerability of deep neural networks to small and even imperceptible perturbations has become a central topic in deep learning research. We propose Dynamically Dynamically Nonlocal Gradient Descent (DSNGD) as a vulnerability defense mechanism. We show that DSNGD-based attacks are average 35% faster while achieving 0.9% to 27.1% higher success rates compared to their gradient descent-based counterparts.
arXiv Detail & Related papers (2020-11-05T08:55:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.