A Low-Rank Defense Method for Adversarial Attack on Diffusion Models
- URL: http://arxiv.org/abs/2602.10319v1
- Date: Tue, 10 Feb 2026 21:56:59 GMT
- Title: A Low-Rank Defense Method for Adversarial Attack on Diffusion Models
- Authors: Jiaxuan Zhu, Siyu Huang,
- Abstract summary: We propose an efficient defensive strategy, named Low-Rank Defense (LoRD), to defend the adversarial attack on Latent Diffusion Models (LDMs)<n>LoRD introduces the merging idea and a balance parameter, combined with the low-rank adaptation (LoRA) modules, to detect and defend the adversarial samples.<n>Our method ensures that the LDM fine-tuned on both adversarial and clean samples can still generate high-quality images.
- Score: 13.070143225543724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, adversarial attacks for diffusion models as well as their fine-tuning process have been developed rapidly. To prevent the abuse of these attack algorithms from affecting the practical application of diffusion models, it is critical to develop corresponding defensive strategies. In this work, we propose an efficient defensive strategy, named Low-Rank Defense (LoRD), to defend the adversarial attack on Latent Diffusion Models (LDMs). LoRD introduces the merging idea and a balance parameter, combined with the low-rank adaptation (LoRA) modules, to detect and defend the adversarial samples. Based on LoRD, we build up a defense pipeline that applies the learned LoRD modules to help diffusion models defend against attack algorithms. Our method ensures that the LDM fine-tuned on both adversarial and clean samples can still generate high-quality images. To demonstrate the effectiveness of our approach, we conduct extensive experiments on facial and landscape images, and our method shows significantly better defense performance compared to the baseline methods.
Related papers
- DLADiff: A Dual-Layer Defense Framework against Fine-Tuning and Zero-Shot Customization of Diffusion Models [74.9510349783152]
Malicious actors can exploit diffusion model customization with just a few or even one image of a person to create synthetic identities nearly identical to the original identity.<n>This paper proposes Dual-Layer Anti-Diffusion (DLADiff) to defend both fine-tuning methods and zero-shot methods.
arXiv Detail & Related papers (2025-11-25T04:35:55Z) - MARS: A Malignity-Aware Backdoor Defense in Federated Learning [51.77354308287098]
Recently proposed state-of-the-art (SOTA) attack, 3DFed, uses an indicator mechanism to determine whether backdoor models have been accepted by the defender.<n>We propose a Malignity-Aware backdooR defenSe (MARS) that leverages backdoor energy to indicate the malicious extent of each neuron.<n>Experiments demonstrate that MARS can defend against SOTA backdoor attacks and significantly outperforms existing defenses.
arXiv Detail & Related papers (2025-09-21T14:50:02Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks [61.06621533874629]
Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - Real-world Adversarial Defense against Patch Attacks based on Diffusion Model [34.86098237949215]
This paper introduces DIFFender, a novel DIFfusion-based DeFender framework to counter adversarial patch attacks.
At the core of our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon.
DIFFender seamlessly integrates the tasks of patch localization and restoration within a unified diffusion model framework.
arXiv Detail & Related papers (2024-09-14T10:38:35Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Toward effective protection against diffusion based mimicry through
score distillation [15.95715097030366]
Efforts have been made to add perturbations to protect images from diffusion-based mimicry pipelines.
Most of the existing methods are too ineffective and even impractical to be used by individual users.
We present novel findings on attacking latent diffusion models and propose new plug-and-play strategies for more effective protection.
arXiv Detail & Related papers (2023-10-02T18:56:12Z) - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z) - DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks [34.86098237949214]
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models.
This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks.
DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework.
arXiv Detail & Related papers (2023-06-15T13:33:27Z) - Adversarial example generation with AdaBelief Optimizer and Crop
Invariance [8.404340557720436]
Adversarial attacks can be an important method to evaluate and select robust models in safety-critical applications.
We propose AdaBelief Iterative Fast Gradient Method (ABI-FGM) and Crop-Invariant attack Method (CIM) to improve the transferability of adversarial examples.
Our method has higher success rates than state-of-the-art gradient-based attack methods.
arXiv Detail & Related papers (2021-02-07T06:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.