Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving
- URL: http://arxiv.org/abs/2509.16950v2
- Date: Sat, 11 Oct 2025 21:46:17 GMT
- Title: Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving
- Authors: Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang,
- Abstract summary: We introduce a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehicles' trajectories as triggers.<n>We develop a negative training strategy by incorporating patch trajectories that are similar to triggers but are designated not to activate the backdoor.<n>We demonstrate the flexibility and effectiveness of our proposed attack, showing the under-exploration of existing end-to-end AD systems' vulnerabilities.
- Score: 26.094398456088403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assessing the safety of autonomous driving (AD) systems against security threats, particularly backdoor attacks, is a stepping stone for real-world deployment. However, existing works mainly focus on pixel-level triggers that are impractical to deploy in the real world. We address this gap by introducing a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehicles' trajectories as triggers. To generate precise trigger trajectories, we first use temporal logic (TL) specifications to define the behaviors of attacker vehicles. Configurable behavior models are then used to generate these trajectories, which are quantitatively evaluated and iteratively refined based on the TL specifications. We further develop a negative training strategy by incorporating patch trajectories that are similar to triggers but are designated not to activate the backdoor. It enhances the stealthiness of the attack and refines the system's responses to trigger scenarios. Through extensive experiments on 5 offline reinforcement learning (RL) driving agents with 6 trigger patterns and target action combinations, we demonstrate the flexibility and effectiveness of our proposed attack, showing the under-exploration of existing end-to-end AD systems' vulnerabilities to such trajectory-based backdoor attacks.
Related papers
- Defense Against Indirect Prompt Injection via Tool Result Parsing [5.69701430275527]
LLM agents face an escalating threat from indirect prompt injection.<n>This vulnerability poses a significant risk as agents gain more direct control over physical environments.<n>We propose a novel method that provides LLMs with precise data via tool result parsing while effectively filtering out injected malicious code.
arXiv Detail & Related papers (2026-01-08T10:21:56Z) - Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS [2.179313476241343]
Black box adversarial attacks are increasingly targeted by machine learning based intrusion detection systems.<n>We propose Adaptive Feature Poisoning, a lightweight and proactive defense mechanism designed specifically for realistic black box scenarios.<n>We evaluate its ability to confuse attackers, degrade attack effectiveness, and preserve detection performance.
arXiv Detail & Related papers (2025-12-15T16:29:23Z) - Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning [89.1856483797116]
We introduce BEAT, the first framework to inject visual backdoors into MLLM-based embodied agents.<n>Unlike textual triggers, object triggers exhibit wide variation across viewpoints and lighting, making them difficult to implant reliably.<n>BEAT achieves attack success rates up to 80%, while maintaining strong benign task performance.
arXiv Detail & Related papers (2025-10-31T16:50:49Z) - Exploiting Web Search Tools of AI Agents for Data Exfiltration [0.46664938579243564]
Large language models (LLMs) are now routinely used to execute complex tasks, from natural language processing to dynamic like web searches.<n>The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse.<n>We analyze how susceptible current LLMs are to indirect prompt injection attacks, which parameters, including model size and manufacturer, shape their vulnerability, and which attack methods remain most effective.
arXiv Detail & Related papers (2025-10-10T07:39:01Z) - Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection [1.102914654802229]
Advanced Persistent Threats (APTs) represent a growing menace to modern digital infrastructure.<n>Unlike traditional cyberattacks, APTs are stealthy, adaptive, and long-lasting, often bypassing signature-based detection systems.<n>This paper introduces a novel framework for APT detection that unites deep learning, reinforcement learning, and active learning into a cohesive, adaptive defense system.
arXiv Detail & Related papers (2025-08-26T14:29:10Z) - TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models [67.06525001375722]
TrojanTO is the first action-level backdoor attack against TO models.<n>It implants backdoor attacks across diverse tasks and attack objectives with a low attack budget.<n>TrojanTO exhibits broad applicability to DT, GDT, and DC.
arXiv Detail & Related papers (2025-06-15T11:27:49Z) - Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems.<n>We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics.<n>We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z) - AnywhereDoor: Multi-Target Backdoor Attacks on Object Detection [9.539021752700823]
AnywhereDoor is a multi-target backdoor attack for object detection.<n>It allows adversaries to make objects disappear, fabricate new ones or mislabel them, either across all object classes or specific ones.<n>It improves attack success rates by 26% compared to adaptations of existing methods for such flexible control.
arXiv Detail & Related papers (2024-11-21T15:50:59Z) - ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs [17.853862145962292]
We introduce a novel backdoor attack that systematically bypasses system prompts.
Our method achieves an attack success rate (ASR) of up to 99.50% while maintaining a clean accuracy (CACC) of 98.58%.
arXiv Detail & Related papers (2024-10-05T02:58:20Z) - Pre-trained Trojan Attacks for Visual Recognition [106.13792185398863]
Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks.
We propose the Pre-trained Trojan attack, which embeds backdoors into a PVM, enabling attacks across various downstream vision tasks.
We highlight the challenges posed by cross-task activation and shortcut connections in successful backdoor attacks.
arXiv Detail & Related papers (2023-12-23T05:51:40Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment [31.24530091590395]
We study an attack scenario called Trojan Activation Attack (TA2), which injects trojan steering vectors into the activation layers of Large Language Models.
Our experiment results show that TA2 is highly effective and adds little or no overhead to attack efficiency.
arXiv Detail & Related papers (2023-11-15T23:07:40Z) - Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory
Prediction in Autonomous Driving [18.72382517467458]
We propose a novel adversarial backdoor attack against trajectory prediction models.
Our attack affects the victim at training time via naturalistic, hence stealthy, poisoned samples crafted using a novel two-step approach.
We show that the proposed attack is highly effective, as it can significantly hinder the performance of prediction models.
arXiv Detail & Related papers (2023-06-27T19:15:06Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.