Identifying a Training-Set Attack's Target Using Renormalized Influence
Estimation
- URL: http://arxiv.org/abs/2201.10055v1
- Date: Tue, 25 Jan 2022 02:36:34 GMT
- Title: Identifying a Training-Set Attack's Target Using Renormalized Influence
Estimation
- Authors: Zayd Hammoudeh and Daniel Lowd
- Abstract summary: This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack.
Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction.
- Score: 11.663072799764542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Targeted training-set attacks inject malicious instances into the training
set to cause a trained model to mislabel one or more specific test instances.
This work proposes the task of target identification, which determines whether
a specific test instance is the target of a training-set attack. This can then
be combined with adversarial-instance identification to find (and remove) the
attack instances, mitigating the attack with minimal impact on other
predictions. Rather than focusing on a single attack method or data modality,
we build on influence estimation, which quantifies each training instance's
contribution to a model's prediction. We show that existing influence
estimators' poor practical performance often derives from their over-reliance
on instances and iterations with large losses. Our renormalized influence
estimators fix this weakness; they far outperform the original ones at
identifying influential groups of training examples in both adversarial and
non-adversarial settings, even finding up to 100% of adversarial training
instances with no clean-data false positives. Target identification then
simplifies to detecting test instances with anomalous influence values. We
demonstrate our method's generality on backdoor and poisoning attacks across
various data domains including text, vision, and speech. Our source code is
available at https://github.com/ZaydH/target_identification .
Related papers
- DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z) - Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack.
New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z) - Membership Inference Attacks by Exploiting Loss Trajectory [19.900473800648243]
We propose a new attack method, called system, which can exploit the membership information from the whole training process of the target model.
Our attack achieves at least 6$times$ higher true-positive rate at a low false-positive rate of 0.1% than existing methods.
arXiv Detail & Related papers (2022-08-31T16:02:26Z) - Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties.
Our attacks are effective across membership inference, attribute inference, and data extraction.
Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z) - Membership Inference Attacks From First Principles [24.10746844866869]
A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset.
These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set.
We argue that attacks should instead be evaluated by computing their true-positive rate at low false-positive rates, and find most prior attacks perform poorly when evaluated in this way.
Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics.
arXiv Detail & Related papers (2021-12-07T08:47:00Z) - Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set.
We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance.
Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Membership Leakage in Label-Only Exposures [10.875144776014533]
We propose decision-based membership inference attacks against machine learning models.
In particular, we develop two types of decision-based attacks, namely transfer attack, and boundary attack.
We also present new insights on the success of membership inference based on quantitative and qualitative analysis.
arXiv Detail & Related papers (2020-07-30T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.