LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes
- URL: http://arxiv.org/abs/2510.03747v1
- Date: Sat, 04 Oct 2025 09:22:26 GMT
- Title: LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes
- Authors: Zuomin Qu, Yimao Guo, Qianyue Hu, Wei Lu,
- Abstract summary: Low-Rank Adaptation (LoRA) patching injects a plug-and-play LoRA patch into Deepfake generators to bypass state-of-the-art defenses.<n>A learnable gating mechanism adaptively controls the effect of the LoRA patch and prevents explosions during fine-tuning.<n>With only 1,000 facial examples and a single epoch of fine-tuning, LoRA patching successfully defeats multiple preemptive defenses.
- Score: 4.217198925206348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deepfakes pose significant societal risks, motivating the development of proactive defenses that embed adversarial perturbations in facial images to prevent manipulation. However, in this paper, we show that these preemptive defenses often lack robustness and reliability. We propose a novel approach, Low-Rank Adaptation (LoRA) patching, which injects a plug-and-play LoRA patch into Deepfake generators to bypass state-of-the-art defenses. A learnable gating mechanism adaptively controls the effect of the LoRA patch and prevents gradient explosions during fine-tuning. We also introduce a Multi-Modal Feature Alignment (MMFA) loss, encouraging the features of adversarial outputs to align with those of the desired outputs at the semantic level. Beyond bypassing, we present defensive LoRA patching, embedding visible warnings in the outputs as a complementary solution to mitigate this newly identified security vulnerability. With only 1,000 facial examples and a single epoch of fine-tuning, LoRA patching successfully defeats multiple proactive defenses. These results reveal a critical weakness in current paradigms and underscore the need for more robust Deepfake defense strategies. Our code is available at https://github.com/ZOMIN28/LoRA-Patching.
Related papers
- The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections [74.60337113759313]
Current defenses against jailbreaks and prompt injections are typically evaluated against a static set of harmful attack strings.<n>We argue that this evaluation process is flawed. Instead, we should evaluate defenses against adaptive attackers who explicitly modify their attack strategy to counter a defense's design.
arXiv Detail & Related papers (2025-10-10T05:51:04Z) - Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake [14.10448497174767]
We argue that an effective defense not only distorts forged content but also blocks the model's ability to adapt.<n>To achieve this, we propose an innovative Two-Stage Defense Framework (TSDF)<n>Our framework shows a strong dual defense capability, which can improve the persistence of active defense.
arXiv Detail & Related papers (2025-08-11T09:26:48Z) - LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution [84.2846064139183]
Large Language Models (LLMs) face threats from jailbreak prompts.<n>We propose LightDefense, a lightweight defense mechanism targeted at white-box models.
arXiv Detail & Related papers (2025-04-02T09:21:26Z) - The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense [56.32083100401117]
The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise.<n>Recent defense mechanisms against these attacks have reached near-saturation performance on benchmark evaluations.
arXiv Detail & Related papers (2024-11-13T07:57:19Z) - Representation Noising: A Defence Mechanism Against Harmful Finetuning [28.451676139178687]
Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes.
We propose Representation Noising (RepNoise), a defence mechanism that operates even when attackers have access to the weights.
arXiv Detail & Related papers (2024-05-23T13:51:55Z) - LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem [55.2986934528672]
We study how backdoors can be injected into task-enhancing LoRAs.<n>We find that with a simple, efficient, yet specific recipe, a backdoor LoRA can be trained once and then seamlessly merged with multiple LoRAs.<n>Our work is among the first to study this new threat model of training-free distribution of downstream-capable-yet-backdoor-injected LoRAs.
arXiv Detail & Related papers (2024-02-29T20:25:16Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Defending Object Detectors against Patch Attacks with Out-of-Distribution Smoothing [21.174037250133622]
We introduce OODSmoother, which characterizes the properties of approaches that aim to remove adversarial patches.<n>This framework naturally guides us to design 1) a novel adaptive attack that breaks existing patch attack defenses on object detectors, and 2) a novel defense approach SemPrior that takes advantage of semantic priors.<n>We find that SemPrior alone provides up to a 40% gain, or up to a 60% gain when combined with existing defenses.
arXiv Detail & Related papers (2022-05-18T15:20:18Z) - LAFEAT: Piercing Through Adversarial Defenses with Latent Features [15.189068478164337]
We show that latent features in certain "robust" models are surprisingly susceptible to adversarial attacks.
We introduce a unified $ell_infty$-norm white-box attack algorithm which harnesses latent features in its gradient descent steps, namely LAFEAT.
arXiv Detail & Related papers (2021-04-19T13:22:20Z) - Attack Agnostic Adversarial Defense via Visual Imperceptible Bound [70.72413095698961]
This research aims to design a defense model that is robust within a certain bound against both seen and unseen adversarial attacks.
The proposed defense model is evaluated on the MNIST, CIFAR-10, and Tiny ImageNet databases.
The proposed algorithm is attack agnostic, i.e. it does not require any knowledge of the attack algorithm.
arXiv Detail & Related papers (2020-10-25T23:14:26Z) - PatchGuard: A Provably Robust Defense against Adversarial Patches via
Small Receptive Fields and Masking [46.03749650789915]
Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image.
We propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches.
arXiv Detail & Related papers (2020-05-17T03:38:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.