Related papers: Provable Defense against Backdoor Policies in Reinforcement Learning

Provable Defense against Backdoor Policies in Reinforcement Learning

URL: http://arxiv.org/abs/2211.10530v1
Date: Fri, 18 Nov 2022 23:12:24 GMT
Title: Provable Defense against Backdoor Policies in Reinforcement Learning
Authors: Shubham Kumar Bharti, Xuezhou Zhang, Adish Singla, Xiaojin Zhu
Abstract summary: A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption.
Score: 35.908468039596734
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves $\epsilon$ approximate optimality in the presence of triggers, provided the number of clean interactions is $O\left(\frac{D}{(1-\gamma)^4 \epsilon^2}\right)$ where $\gamma$ is the discounting factor and $D$ is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments.

Related papers

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense. We find that the original backdoors still exist in defense models derived from existing post-training defense strategies. We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
Cooperative Backdoor Attack in Decentralized Reinforcement Learning with Theoretical Guarantee [21.596629203866925]
The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Our method decomposes the backdoor behavior into multiple components according to the state space of RL. To the best of our knowledge, this is the first paper presenting a provable cooperative backdoor attack in decentralized reinforcement learning.
arXiv Detail & Related papers (2024-05-24T06:13:31Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL [46.32591437241358]
In this paper, we consider a multi-agent setting where a well-trained victim agent is exploited by an attacker controlling another agent. Previous models do not account for the possibility that the attacker may only have partial control over $alpha$ or that the attack may produce easily detectable "abnormal" behaviors. We introduce a generalized attack framework that has the flexibility to model what extent the adversary is able to control the agent. We offer a provably efficient defense with convergence to the most robust victim policy through adversarial training with timescale separation.
arXiv Detail & Related papers (2023-05-27T02:54:07Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Training Automated Defense Strategies Using Graph-based Cyber Attack Simulations [0.0]
We implement and evaluate an automated cyber defense agent. The agent takes security alerts as input and uses reinforcement learning to learn a policy for executing predefined defensive measures. In experiments, the defensive agent using policies trained with reinforcement learning outperformed agents using policies.
arXiv Detail & Related papers (2023-04-17T07:52:00Z)
Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning [23.94769537680776]
A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent. This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks.
arXiv Detail & Related papers (2023-04-01T08:00:32Z)
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents. We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS) DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z)
Defense Against Reward Poisoning Attacks in Reinforcement Learning [29.431349181232203]
We study defense strategies against reward poisoning attacks in reinforcement learning. We propose an optimization framework for deriving optimal defense policies. We show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees.
arXiv Detail & Related papers (2021-02-10T23:31:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.