Related papers: When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

URL: http://arxiv.org/abs/2602.18739v1
Date: Sat, 21 Feb 2026 07:22:37 GMT
Title: When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models
Authors: Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, Dacheng Tao,
Abstract summary: We present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels.<n>PhysCond-WMA induces semantic, logic, or decision-level distortion while preserving perceptual fidelity.
Score: 54.08784776767683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative world models (WMs) are increasingly used to synthesize controllable, sensor-conditioned driving videos, yet their reliance on physical priors exposes novel attack surfaces. In this paper, we present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity. PhysCond-WMA is optimized in two stages: (1) a quality-preserving guidance stage that constrains reverse-diffusion loss below a calibrated threshold, and (2) a momentum-guided denoising stage that accumulates target-aligned gradients along the denoising trajectory for stable, temporally coherent semantic shifts. Extensive experimental results demonstrate that our approach remains effective while increasing FID by about 9% on average and FVD by about 3.9% on average. Under the targeted attack setting, the attack success rate (ASR) reaches 0.55. Downstream studies further show tangible risk, which using attacked videos for training decreases 3D detection performance by about 4%, and worsens open-loop planning performance by about 20%. These findings has for the first time revealed and quantified security vulnerabilities in generative world models, driving more comprehensive security checkers.

Related papers

BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning [73.46118996284888]
Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence.<n>We propose BadCLIP++, a unified framework that tackles both challenges.<n>For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions.<n>For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment.
arXiv Detail & Related papers (2026-02-19T08:31:16Z)
FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation [19.153777175873547]
We introduce FP-AbDiff, the first antibody generator to enforce Fokker-Planck Equation (FPE) physics along the entire generative trajectory.<n>By aligning generative dynamics with physical laws, FP-AbDiff enhances robustness and generalizability, establishing a principled approach for physically faithful and functionally viable antibody design.
arXiv Detail & Related papers (2025-11-05T01:44:37Z)
DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models [59.45605332033458]
Safety mechanisms can backfire, causing over-refusal, where models decline benign requests out of excessive caution.<n>No existing benchmark has systematically addressed over-refusal in the visual modality.<n>This setting introduces unique challenges, such as dual-use cases where an instruction is harmless, but the accompanying image contains harmful content.
arXiv Detail & Related papers (2025-10-12T23:21:34Z)
Sequence-Preserving Dual-FoV Defense for Traffic Sign and Light Recognition in Autonomous Vehicles [0.07646713951724012]
This study proposes a dual FoV, sequence-preserving robustness framework for traffic lights and signs in the USA.<n>Over a series of experiments on a real-life application of anomaly detection, this study outlines a unified three-layer defense stack framework.
arXiv Detail & Related papers (2025-10-03T00:43:25Z)
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains.<n>We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness.<n>We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
Exploring the Physical World Adversarial Robustness of Vehicle Detection [13.588120545886229]
Adrial attacks can compromise the robustness of real-world detection models. We propose an innovative instant-level data generation pipeline using the CARLA simulator. Our findings highlight diverse model performances under adversarial conditions.
arXiv Detail & Related papers (2023-08-07T11:09:12Z)
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection [14.202833467294765]
Adversarial attacks in the physical world can harm the robustness of detection models. Yolo v6 had strongest resistance, with only a 6.59% average AP drop, and ASA was the most effective attack algorithm with a 14.51% average AP reduction.
arXiv Detail & Related papers (2023-04-11T09:48:25Z)
Robust Trajectory Prediction against Adversarial Attacks [84.10405251683713]
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving systems. These methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions. In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks.
arXiv Detail & Related papers (2022-07-29T22:35:05Z)
Evaluating the Robustness of Semantic Segmentation for Autonomous Driving against Real-World Adversarial Patch Attacks [62.87459235819762]
In a real-world scenario like autonomous driving, more attention should be devoted to real-world adversarial examples (RWAEs) This paper presents an in-depth evaluation of the robustness of popular SS models by testing the effects of both digital and real-world adversarial patches.
arXiv Detail & Related papers (2021-08-13T11:49:09Z)
Dynamically Sampled Nonlocal Gradients for Stronger Adversarial Attacks [3.055601224691843]
The vulnerability of deep neural networks to small and even imperceptible perturbations has become a central topic in deep learning research. We propose Dynamically Dynamically Nonlocal Gradient Descent (DSNGD) as a vulnerability defense mechanism. We show that DSNGD-based attacks are average 35% faster while achieving 0.9% to 27.1% higher success rates compared to their gradient descent-based counterparts.
arXiv Detail & Related papers (2020-11-05T08:55:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.