Related papers: Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning

Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning

URL: http://arxiv.org/abs/2505.19532v1
Date: Mon, 26 May 2025 05:39:35 GMT
Title: Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning
Authors: Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein,
Abstract summary: Current state-of-the-art backdoor attacks against Reinforcement Learning (RL) rely upon unrealistically permissive access models.<n>We propose the underlineSupply-underlineChunderlineain underlineBackdoor (SCAB) attack.<n>Our attack can successfully activate over $90%$ of triggered actions, reducing the average episodic return by $80%$ for the victim.
Score: 20.41701122824956
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The current state-of-the-art backdoor attacks against Reinforcement Learning (RL) rely upon unrealistically permissive access models, that assume the attacker can read (or even write) the victim's policy parameters, observations, or rewards. In this work, we question whether such a strong assumption is required to launch backdoor attacks against RL. To answer this question, we propose the \underline{S}upply-\underline{C}h\underline{a}in \underline{B}ackdoor (SCAB) attack, which targets a common RL workflow: training agents using external agents that are provided separately or embedded within the environment. In contrast to prior works, our attack only relies on legitimate interactions of the RL agent with the supplied agents. Despite this limited access model, by poisoning a mere $3\%$ of training experiences, our attack can successfully activate over $90\%$ of triggered actions, reducing the average episodic return by $80\%$ for the victim. Our novel attack demonstrates that RL attacks are likely to become a reality under untrusted RL training supply-chains.

Related papers

Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation. We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z)
SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents [16.350898218047405]
Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. We formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy.
arXiv Detail & Related papers (2024-05-30T23:31:25Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning [37.19070609394519]
Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection.
arXiv Detail & Related papers (2023-12-19T20:29:29Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning [23.94769537680776]
A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent. This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks.
arXiv Detail & Related papers (2023-04-01T08:00:32Z)
BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets [31.122826345966065]
Reinforcement learning (RL) makes an agent learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL has become a popular RL paradigm because it saves the interactions with environments. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) We propose Baffle, an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset.
arXiv Detail & Related papers (2022-10-07T07:56:17Z)
On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern. In general, adversarial training is believed to defend against backdoor attacks. We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z)
BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning [80.99426477001619]
We migrate backdoor attacks to more complex RL systems involving multiple agents. As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action. The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.
arXiv Detail & Related papers (2021-05-02T23:47:55Z)
Adaptive Reward-Poisoning Attacks against Reinforcement Learning [43.07944714475278]
In reward-poisoning attacks against reinforcement learning, an attacker can perturb the environment reward $r_t$ into $r_t+delta_t$ at each step. We show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps in state-space size $|S|$. We also show that an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.
arXiv Detail & Related papers (2020-03-27T19:46:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.