Mole Recruitment: Poisoning of Image Classifiers via Selective Batch
Sampling
- URL: http://arxiv.org/abs/2303.17080v1
- Date: Thu, 30 Mar 2023 00:59:37 GMT
- Title: Mole Recruitment: Poisoning of Image Classifiers via Selective Batch
Sampling
- Authors: Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang
- Abstract summary: We present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.
This is achieved by simply leveraging the most confounding natural samples found within the training data itself.
We define moles as the training samples of a class that appear most similar to samples of another class.
- Score: 41.29604559362772
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we present a data poisoning attack that confounds machine
learning models without any manipulation of the image or label. This is
achieved by simply leveraging the most confounding natural samples found within
the training data itself, in a new form of a targeted attack coined "Mole
Recruitment." We define moles as the training samples of a class that appear
most similar to samples of another class, and show that simply restructuring
training batches with an optimal number of moles can lead to significant
degradation in the performance of the targeted class. We show the efficacy of
this novel attack in an offline setting across several standard image
classification datasets, and demonstrate the real-world viability of this
attack in a continual learning (CL) setting. Our analysis reveals that
state-of-the-art models are susceptible to Mole Recruitment, thereby exposing a
previously undetected vulnerability of image classifiers.
Related papers
- Dirty and Clean-Label attack detection using GAN discriminators [0.0]
This research uses GAN discriminators to protect a single class against mislabeled and different levels of modified images.<n>The results suggest that after training on a single class, GAN discriminator s confidence scores can provide a threshold to identify mislabeled images.
arXiv Detail & Related papers (2025-06-02T00:32:07Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Object-oriented backdoor attack against image captioning [40.5688859498834]
Backdoor attack against image classification task has been widely studied and proven to be successful.
In this paper, we explore backdoor attack towards image captioning models by poisoning training data.
Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
arXiv Detail & Related papers (2024-01-05T01:52:13Z) - Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion
Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models.
This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label.
Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Amplifying Membership Exposure via Data Poisoning [18.799570863203858]
In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples.
We propose a set of data poisoning attacks to amplify the membership exposure of the targeted class.
Our results show that the proposed attacks can substantially increase the membership inference precision with minimum overall test-time model performance degradation.
arXiv Detail & Related papers (2022-11-01T13:52:25Z) - Pixle: a fast and effective black-box attack based on rearranging pixels [15.705568893476947]
Black-box adversarial attacks can be performed without knowing the inner structure of the attacked model.
We propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image.
We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye.
arXiv Detail & Related papers (2022-02-04T17:03:32Z) - Bridging Non Co-occurrence with Unlabeled In-the-wild Data for
Incremental Object Detection [56.22467011292147]
Several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection.
Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes.
We propose the use of unlabeled in-the-wild data to bridge the non-occurrence caused by the missing base classes during the training of additional novel classes.
arXiv Detail & Related papers (2021-10-28T10:57:25Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Towards Class-Oriented Poisoning Attacks Against Neural Networks [1.14219428942199]
Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset.
We propose a class-oriented poisoning attack that is capable of forcing the corrupted model to predict in two specific ways.
To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework.
arXiv Detail & Related papers (2020-07-31T19:27:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.