LoRAGuard: An Effective Black-box Watermarking Approach for LoRAs
- URL: http://arxiv.org/abs/2501.15478v1
- Date: Sun, 26 Jan 2025 10:46:59 GMT
- Title: LoRAGuard: An Effective Black-box Watermarking Approach for LoRAs
- Authors: Peizhuo Lv, Yiran Xiahou, Congyi Li, Mengjie Sun, Shengzhi Zhang, Kai Chen, Yingjun Zhang,
- Abstract summary: We introduce LoRAGuard, a novel black-box watermarking technique for detecting unauthorized misuse of LoRAs.<n>LoRAGuard achieves nearly 100% watermark verification success and demonstrates strong effectiveness.
- Score: 14.199095322820314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LoRA (Low-Rank Adaptation) has achieved remarkable success in the parameter-efficient fine-tuning of large models. The trained LoRA matrix can be integrated with the base model through addition or negation operation to improve performance on downstream tasks. However, the unauthorized use of LoRAs to generate harmful content highlights the need for effective mechanisms to trace their usage. A natural solution is to embed watermarks into LoRAs to detect unauthorized misuse. However, existing methods struggle when multiple LoRAs are combined or negation operation is applied, as these can significantly degrade watermark performance. In this paper, we introduce LoRAGuard, a novel black-box watermarking technique for detecting unauthorized misuse of LoRAs. To support both addition and negation operations, we propose the Yin-Yang watermark technique, where the Yin watermark is verified during negation operation and the Yang watermark during addition operation. Additionally, we propose a shadow-model-based watermark training approach that significantly improves effectiveness in scenarios involving multiple integrated LoRAs. Extensive experiments on both language and diffusion models show that LoRAGuard achieves nearly 100% watermark verification success and demonstrates strong effectiveness.
Related papers
- AuthenLoRA: Entangling Stylization with Imperceptible Watermarks for Copyright-Secure LoRA Adapters [52.556959321030966]
Low-Rank Adaptation (LoRA) offers an efficient paradigm for customizing diffusion models.<n>Existing watermarking techniques either target base models or verify LoRA modules themselves.<n>We propose AuthenLoRA, a unified watermarking framework that embeds imperceptible, traceable watermarks directly into the LoRA training process.
arXiv Detail & Related papers (2025-11-26T09:48:11Z) - EditMark: Watermarking Large Language Models based on Model Editing [76.04893766374221]
We propose EditMark, the first watermarking method that leverages model editing to embed a training-free, stealthy, and performance-lossless watermark.<n>Experiments indicate that EditMark can embed 32-bit watermarks into LLMs within 20 seconds with a watermark extraction success rate of 100%.
arXiv Detail & Related papers (2025-10-18T06:25:17Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - SEAL: Entangled White-box Watermarks on Low-Rank Adaptation [14.478685983719128]
SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership.<n>When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks.
arXiv Detail & Related papers (2025-01-16T04:17:56Z) - ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization [15.570148419846175]
Existing watermarking methods face the challenge of balancing robustness and concealment.
This paper introduces a watermark hiding process to actively achieve concealment, thus allowing the embedding of stronger watermarks.
Experiments on various diffusion models demonstrate the watermark remains verifiable even under significant image tampering.
arXiv Detail & Related papers (2024-11-06T12:14:23Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.
Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.
We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z) - Watermarking Recommender Systems [52.207721219147814]
We introduce Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems.
Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores.
To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence.
arXiv Detail & Related papers (2024-07-17T06:51:24Z) - AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA [67.68750063537482]
Diffusion models have achieved remarkable success in generating high-quality images.
Recent works aim to let SD models output watermarked content for post-hoc forensics.
We propose textttmethod as the first implementation under this scenario.
arXiv Detail & Related papers (2024-05-18T01:25:47Z) - Mixture of LoRA Experts [87.50120181861362]
This paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection.
The MoLE approach achieves superior LoRA fusion performance in comparison to direct arithmetic merging.
arXiv Detail & Related papers (2024-04-21T11:59:53Z) - LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation [27.123271324468657]
Low-Rank Adaptation (LoRA) is currently the most commonly used.
efficient fine-tuning (PEFT) method.
It introduces auxiliary parameters for each layer to fine-tune the pre-trained model under limited computing resources.
However, it still faces resource consumption challenges when scaling up to larger models.
arXiv Detail & Related papers (2024-02-12T15:34:56Z) - Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs [23.639074918667625]
We propose a novel multi-bit box-free watermarking method for GANs with improved robustness against white-box attacks.
The watermark is embedded by adding an extra watermarking loss term during GAN training.
We show that the presence of the watermark has a negligible impact on the quality of the generated images.
arXiv Detail & Related papers (2023-10-25T18:38:10Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z) - From Attack to Protection: Leveraging Watermarking Attack Network for Advanced Add-on Watermarking [7.101522983541308]
Multi-bit watermarking (MW) has been designed to enhance resistance against watermarking attacks.<n> benchmark tools exist to assess this robustness through simulated attacks on watermarked images.<n>We introduce a watermarking attack network (WAN), a fully trainable watermarking benchmark tool designed to exploit vulnerabilities within MW systems.
arXiv Detail & Related papers (2020-08-14T09:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.