Related papers: Adversarial Inception for Bounded Backdoor Poisoning in Deep Reinforcement Learning

Related papers

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning [89.1856483797116]
We introduce BEAT, the first framework to inject visual backdoors into MLLM-based embodied agents.<n>Unlike textual triggers, object triggers exhibit wide variation across viewpoints and lighting, making them difficult to implant reliably.<n>BEAT achieves attack success rates up to 80%, while maintaining strong benign task performance.
arXiv Detail & Related papers (2025-10-31T16:50:49Z)
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z)
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z)
CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems [13.776447110639193]
We introduce a novel method that involves injecting traitor agents into the CMARL system. In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward. CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies.
arXiv Detail & Related papers (2024-06-25T09:59:31Z)
SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents [16.350898218047405]
Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. We formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy.
arXiv Detail & Related papers (2024-05-30T23:31:25Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
On the Difficulty of Defending Contrastive Learning against Backdoor Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms. Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning [45.408568528354216]
We investigate the impact of adversarial attacks on multi-agent reinforcement learning (MARL) In the considered setup, there is an attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
arXiv Detail & Related papers (2023-07-15T00:38:55Z)
Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning [23.94769537680776]
A backdoor attack allows a malicious user to manipulate the environment or corrupt the training data, thus inserting a backdoor into the trained agent. This paper proposes the Recovery Triggered States (RTS) method, a novel approach that effectively protects the victim agents from backdoor attacks.
arXiv Detail & Related papers (2023-04-01T08:00:32Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern. In general, adversarial training is believed to defend against backdoor attacks. We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z)
Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That Backfire [8.782809316491948]
We investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. The results motivate the re-evaluation of backdoor defense research for practical environments.
arXiv Detail & Related papers (2022-01-28T16:11:40Z)
Widen The Backdoor To Let More Attackers In [24.540853975732922]
We investigate the scenario of a multi-agent backdoor attack, where multiple non-colluding attackers craft and insert triggered samples in a shared dataset. We discover a clear backfiring phenomenon: increasing the number of attackers shrinks each attacker's attack success rate. We then exploit this phenomenon to minimize the collective ASR of attackers and maximize defender's robustness accuracy.
arXiv Detail & Related papers (2021-10-09T13:53:57Z)
Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations. We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces. In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward. Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training. We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries. We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs. We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class. We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.