Related papers: Backdoor defense, learnability and obfuscation

Backdoor defense, learnability and obfuscation

URL: http://arxiv.org/abs/2409.03077v1
Date: Wed, 4 Sep 2024 21:05:42 GMT
Title: Backdoor defense, learnability and obfuscation
Authors: Paul Christiano, Jacob Hilton, Victor Lecomte, Mark Xu,
Abstract summary: We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability.
Score: 8.905450847393132
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.

Related papers

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models [55.28518567702213]
Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities.<n>This sequential approach creates a mismatch -- attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats.<n>We propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction.
arXiv Detail & Related papers (2025-06-09T06:35:12Z)
The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses [21.975560789792073]
We formalize and extend existing definitions of backdoor-based watermarks and adversarial defenses as interactive protocols between two players. For almost every discriminative learning task, at least one of the two -- a watermark or an adversarial defense -- exists. We show that any task that satisfies our notion of a transferable attack implies a cryptographic primitive.
arXiv Detail & Related papers (2024-10-11T14:44:05Z)
Improving Adversarial Robustness via Decoupled Visual Representation Masking [65.73203518658224]
In this paper, we highlight two novel properties of robust features from the feature distribution perspective. We find that state-of-the-art defense methods aim to address both of these mentioned issues well. Specifically, we propose a simple but effective defense based on decoupled visual representation masking.
arXiv Detail & Related papers (2024-06-16T13:29:41Z)
Hindering Adversarial Attacks with Multiple Encrypted Patch Embeddings [13.604830818397629]
We propose a new key-based defense focusing on both efficiency and robustness. We build upon the previous defense with two major improvements: (1) efficient training and (2) optional randomization. Experiments were carried out on the ImageNet dataset, and the proposed defense was evaluated against an arsenal of state-of-the-art attacks.
arXiv Detail & Related papers (2023-09-04T14:08:34Z)
Adversary Aware Continual Learning [3.3439097577935213]
Adversary can introduce small amount of misinformation to the model to cause deliberate forgetting of a specific task or class at test time. We use the attacker's primary strength-hiding the backdoor pattern by making it imperceptible to humans-against it, and propose to learn a perceptible (stronger) pattern that can overpower the attacker's imperceptible pattern. We show that our proposed defensive framework considerably improves the performance of class incremental learning algorithms with no knowledge of the attacker's target task, attacker's target class, and attacker's imperceptible pattern.
arXiv Detail & Related papers (2023-04-27T19:49:50Z)
Planning for Attacker Entrapment in Adversarial Settings [16.085007590604327]
We propose a framework to generate a defense strategy against an attacker who is working in an environment where a defender can operate without the attacker's knowledge. Our problem formulation allows us to capture it as a much simpler infinite horizon discounted MDP, in which the optimal policy for the MDP gives the defender's strategy against the actions of the attacker.
arXiv Detail & Related papers (2023-03-01T21:08:27Z)
Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition [56.69875958980474]
This work considers approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations. We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training. We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups.
arXiv Detail & Related papers (2023-02-17T16:19:26Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
Adversarial Classification of the Attacks on Smart Grids Using Game Theory and Deep Learning [27.69899235394942]
This paper proposes a game-theoretic approach to evaluate the variations caused by an attacker on the power measurements. A zero-sum game is used to model the interactions between the attacker and defender.
arXiv Detail & Related papers (2021-06-06T18:43:28Z)
Attack Agnostic Adversarial Defense via Visual Imperceptible Bound [70.72413095698961]
This research aims to design a defense model that is robust within a certain bound against both seen and unseen adversarial attacks. The proposed defense model is evaluated on the MNIST, CIFAR-10, and Tiny ImageNet databases. The proposed algorithm is attack agnostic, i.e. it does not require any knowledge of the attack algorithm.
arXiv Detail & Related papers (2020-10-25T23:14:26Z)
Harnessing adversarial examples with a surprisingly simple defense [47.64219291655723]
I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense.
arXiv Detail & Related papers (2020-04-26T03:09:42Z)
Block Switching: A Stochastic Approach for Deep Learning Security [75.92824098268471]
Recent study of adversarial attacks has revealed the vulnerability of modern deep learning models. In this paper, we introduce Block Switching (BS), a defense strategy against adversarial attacks based on onity.
arXiv Detail & Related papers (2020-02-18T23:14:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.