Related papers: SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

URL: http://arxiv.org/abs/2104.11315v1
Date: Thu, 22 Apr 2021 20:49:40 GMT
Title: SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics
Authors: Jonathan Hayase, Weihao Kong, Raghav Somani, Sewoong Oh
Abstract summary: A small fraction of poisoned data changes the behavior of a trained model when triggered by an attacker-specified watermark. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data.
Score: 44.487762480349765
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples. Code and pre-trained models are available at https://github.com/SewoongLab/spectre-defense .

Related papers

Backdoor Defense through Self-Supervised and Generative Learning [0.0]
Training on such data injects a backdoor which causes malicious inference in selected test samples. This paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset.
arXiv Detail & Related papers (2024-09-02T11:40:01Z)
Partial train and isolate, mitigate backdoor attack [6.583682264938882]
We provide a new model training method (PT) that freezes part of the model to train a model that can isolate suspicious samples. Then, on this basis, a clean model is fine-tuned to resist backdoor attacks.
arXiv Detail & Related papers (2024-05-26T08:54:43Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
A Data-Driven Defense against Edge-case Model Poisoning Attacks on Federated Learning [13.89043799280729]
We propose an effective defense against model poisoning attacks in Federated Learning systems. DataDefense learns a poisoned data detector model which marks each example in the defense dataset as poisoned or clean. It is able to reduce the attack success rate by at least 40% on standard attack setups and by more than 80% on some setups.
arXiv Detail & Related papers (2023-05-03T10:20:26Z)
Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification. CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z)
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z)
Defending against Model Stealing via Verifying Embedded External Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features. Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective. Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z)
Concealed Data Poisoning Attacks on NLP Models [56.794857982509455]
Adversarial attacks alter NLP model predictions by perturbing test-time inputs. We develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input.
arXiv Detail & Related papers (2020-10-23T17:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.