TRAIL: Transferable Robust Adversarial Images via Latent diffusion
- URL: http://arxiv.org/abs/2505.16166v1
- Date: Thu, 22 May 2025 03:11:35 GMT
- Title: TRAIL: Transferable Robust Adversarial Images via Latent diffusion
- Authors: Yuhao Xue, Zhifei Zhang, Xinyang Jiang, Yifei Shen, Junyao Gao, Wentao Gu, Jiale Zhao, Miaojing Shi, Cairong Zhao,
- Abstract summary: Adversarial attacks present severe security risks to deep learning systems.<n> transferability across models remains limited due to distribution mismatches between generated adversarial features and real-world data.<n>We propose Transferable Robust Adrial Images via Latent Diffusion (TRAIL), a test-time adaptation framework.
- Score: 35.54430200195499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks exploiting unrestricted natural perturbations present severe security risks to deep learning systems, yet their transferability across models remains limited due to distribution mismatches between generated adversarial features and real-world data. While recent works utilize pre-trained diffusion models as adversarial priors, they still encounter challenges due to the distribution shift between the distribution of ideal adversarial samples and the natural image distribution learned by the diffusion model. To address the challenge, we propose Transferable Robust Adversarial Images via Latent Diffusion (TRAIL), a test-time adaptation framework that enables the model to generate images from a distribution of images with adversarial features and closely resembles the target images. To mitigate the distribution shift, during attacks, TRAIL updates the diffusion U-Net's weights by combining adversarial objectives (to mislead victim models) and perceptual constraints (to preserve image realism). The adapted model then generates adversarial samples through iterative noise injection and denoising guided by these objectives. Experiments demonstrate that TRAIL significantly outperforms state-of-the-art methods in cross-model attack transferability, validating that distribution-aligned adversarial feature synthesis is critical for practical black-box attacks.
Related papers
- Towards more transferable adversarial attack in black-box manner [1.1417805445492082]
Black-box attacks based on transferability have received significant attention due to their practical applicability in real-world scenarios.<n>Recent state-of-the-art approach DiffPGD has demonstrated enhanced transferability by employing diffusion-based adversarial purification models for adaptive attacks.<n>We propose a novel loss function coupled with a unique surrogate model to validate our hypothesis.
arXiv Detail & Related papers (2025-05-23T16:49:20Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - Deceptive Diffusion: Generating Synthetic Adversarial Examples [2.7309692684728617]
We introduce the concept of deceptive diffusion -- training a generative AI model to produce adversarial images.
A traditional adversarial attack algorithm aims to perturb an existing image to induce a misclassificaton.
The deceptive diffusion model can create an arbitrary number of new, misclassified images that are not directly associated with training or test images.
arXiv Detail & Related papers (2024-06-28T10:30:46Z) - Predicting Cascading Failures with a Hyperparametric Diffusion Model [66.89499978864741]
We study cascading failures in power grids through the lens of diffusion models.
Our model integrates viral diffusion principles with physics-based concepts.
We show that this diffusion model can be learned from traces of cascading failures.
arXiv Detail & Related papers (2024-06-12T02:34:24Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions.
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions.
We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z) - Diffusion Models for Imperceptible and Transferable Adversarial Attack [23.991194050494396]
We propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models.
Our proposed method, DiffAttack, is the first that introduces diffusion models into the adversarial attack field.
arXiv Detail & Related papers (2023-05-14T16:02:36Z) - Towards Understanding and Boosting Adversarial Transferability from a
Distribution Perspective [80.02256726279451]
adversarial attacks against Deep neural networks (DNNs) have received broad attention in recent years.
We propose a novel method that crafts adversarial examples by manipulating the distribution of the image.
Our method can significantly improve the transferability of the crafted attacks and achieves state-of-the-art performance in both untargeted and targeted scenarios.
arXiv Detail & Related papers (2022-10-09T09:58:51Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.