Understanding Variation in Subpopulation Susceptibility to Poisoning
Attacks
- URL: http://arxiv.org/abs/2311.11544v1
- Date: Mon, 20 Nov 2023 05:35:40 GMT
- Title: Understanding Variation in Subpopulation Susceptibility to Poisoning
Attacks
- Authors: Evan Rose, Fnu Suya, David Evans
- Abstract summary: We investigate the properties that can impact the effectiveness of state-of-the-art poisoning attacks against different subpopulations.
We find that dataset separability plays a dominant role in subpopulation vulnerability for less separable datasets.
A crucial subpopulation property is captured by the difference in loss on the clean dataset between the clean model and a target model.
- Score: 9.977765534931596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is susceptible to poisoning attacks, in which an attacker
controls a small fraction of the training data and chooses that data with the
goal of inducing some behavior unintended by the model developer in the trained
model. We consider a realistic setting in which the adversary with the ability
to insert a limited number of data points attempts to control the model's
behavior on a specific subpopulation. Inspired by previous observations on
disparate effectiveness of random label-flipping attacks on different
subpopulations, we investigate the properties that can impact the effectiveness
of state-of-the-art poisoning attacks against different subpopulations. For a
family of 2-dimensional synthetic datasets, we empirically find that dataset
separability plays a dominant role in subpopulation vulnerability for less
separable datasets. However, well-separated datasets exhibit more dependence on
individual subpopulation properties. We further discover that a crucial
subpopulation property is captured by the difference in loss on the clean
dataset between the clean model and a target model that misclassifies the
subpopulation, and a subpopulation is much easier to attack if the loss
difference is small. This property also generalizes to high-dimensional
benchmark datasets. For the Adult benchmark dataset, we show that we can find
semantically-meaningful subpopulation properties that are related to the
susceptibilities of a selected group of subpopulations. The results in this
paper are accompanied by a fully interactive web-based visualization of
subpopulation poisoning attacks found at
https://uvasrg.github.io/visualizing-poisoning
Related papers
- Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift [10.04893653044606]
Subpopulationtion shift can significantly degrade the performance of machine learning models.<n>We propose using an ensemble of diverse classifiers to adaptively capture risk associated with subpopulations.<n>Our method of Diverse Prototypical Ensembles (DPEs) often outperforms the prior state-of-the-art in worst-group accuracy.
arXiv Detail & Related papers (2025-05-29T03:12:56Z) - Boosting Test Performance with Importance Sampling--a Subpopulation Perspective [16.678910111353307]
In this paper, we identify important sampling as a simple yet powerful tool for solving the subpopulation problem.
We provide a new systematic formulation of the subpopulation problem and explicitly identify the assumptions that are not clearly stated in the existing works.
On the application side, we demonstrate a single estimator is enough to solve the subpopulation problem.
arXiv Detail & Related papers (2024-12-17T15:25:24Z) - Data Pruning in Generative Diffusion Models [2.0111637969968]
Generative models aim to estimate the underlying distribution of the data, so presumably they should benefit from larger datasets.
We show that eliminating redundant or noisy data in large datasets is beneficial particularly when done strategically.
arXiv Detail & Related papers (2024-11-19T14:13:25Z) - Fragile Giants: Understanding the Susceptibility of Models to Subpopulation Attacks [2.7016591543910717]
We investigate how model complexity influences susceptibility to subpopulation poisoning attacks.
Our results show that models with more parameters are significantly more vulnerable to subpopulation poisoning.
These results highlight the need to develop defenses that specifically address subpopulation vulnerabilities.
arXiv Detail & Related papers (2024-10-11T14:48:19Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Robin Hood and Matthew Effects -- Differential Privacy Has Disparate
Impact on Synthetic Data [3.2345600015792564]
We analyze the impact of Differential Privacy on generative models.
We show that DP results in opposite size distributions in the generated synthetic data.
We call for caution when analyzing or training a model on synthetic data.
arXiv Detail & Related papers (2021-09-23T15:14:52Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Subpopulation Data Poisoning Attacks [18.830579299974072]
Poisoning attacks against machine learning induce adversarial modification of data used by a machine learning algorithm to selectively change its output when it is deployed.
We introduce a novel data poisoning attack called a emphsubpopulation attack, which is particularly relevant when datasets are large and diverse.
We design a modular framework for subpopulation attacks, instantiate it with different building blocks, and show that the attacks are effective for a variety of datasets and machine learning models.
arXiv Detail & Related papers (2020-06-24T20:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.