Related papers: Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning

Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning

URL: http://arxiv.org/abs/2308.04373v1
Date: Tue, 8 Aug 2023 16:22:44 GMT
Title: Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning
Authors: Simon Queyrut, Y\'erom-David Bromberg, Valerio Schiavoni
Abstract summary: We introduce Pelta, a novel shielding mechanism leveraging trusted hardware. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial attack.
Score: 0.6445605125467573
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.

Related papers

Memory Backdoor Attacks on Neural Networks [3.2720947374803777]
We propose the memory backdoor attack, where a model is covertly trained to specific training samples and later selectively output them. We demonstrate the attack on image classifiers, segmentation models, and a large language model (LLM)
arXiv Detail & Related papers (2024-11-21T16:09:16Z)
Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding [0.20718016474717196]
An adversarial example is a modified input image designed to cause a Machine Learning (ML) model to make a mistake. This study presents a practical and effective solution -- using predictive coding networks (PCnets) as an auxiliary step for adversarial defence.
arXiv Detail & Related papers (2024-10-31T21:38:05Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments [1.8240624028534085]
In image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2023-09-13T14:19:29Z)
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z)
FedDefender: Client-Side Attack-Tolerant Federated Learning [60.576073964874]
Federated learning enables learning from decentralized data sources without compromising privacy. It is vulnerable to model poisoning attacks, where malicious clients interfere with the training process. We propose a new defense mechanism that focuses on the client-side, called FedDefender, to help benign clients train robust local models.
arXiv Detail & Related papers (2023-07-18T08:00:41Z)
Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations [54.1807206010136]
Transferable adversarial attacks optimize adversaries from a pretrained surrogate model and known label space to fool the unknown black-box models. We propose Adversarial Pixel Restoration as a self-supervised alternative to train an effective surrogate model from scratch. Our training approach is based on a min-max objective which reduces overfitting via an adversarial objective.
arXiv Detail & Related papers (2022-07-18T17:59:58Z)
Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective. Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
Evaluating Ensemble Robustness Against Adversarial Attacks [0.0]
Adversarial examples, which are slightly perturbed inputs generated with the aim of fooling a neural network, are known to transfer between models. This concept of transferability poses grave security concerns as it leads to the possibility of attacking models in a black box setting. We introduce a gradient based measure of how effectively an ensemble's constituent models collaborate to reduce the space of adversarial examples targeting the ensemble itself.
arXiv Detail & Related papers (2020-05-12T13:20:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.