DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack
- URL: http://arxiv.org/abs/2512.16182v1
- Date: Thu, 18 Dec 2025 05:08:19 GMT
- Title: DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack
- Authors: Hao Li, Yubing Ren, Yanan Cao, Yingjie Li, Fang Fang, Shi Wang, Li Guo,
- Abstract summary: Cloud-based services have led to growing risks of model abuse in large language models (LLMs)<n>Existing watermarking algorithms primarily focus on defending against paraphrase attacks.<n>We propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks.
- Score: 25.681637904431142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of cloud-based services, large language models (LLMs) have become increasingly accessible through various web platforms. However, this accessibility has also led to growing risks of model abuse. LLM watermarking has emerged as an effective approach to mitigate such misuse and protect intellectual property. Existing watermarking algorithms, however, primarily focus on defending against paraphrase attacks while overlooking piggyback spoofing attacks, which can inject harmful content, compromise watermark reliability, and undermine trust in attribution. To address this limitation, we propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks. DualGuard employs the adaptive dual-stream watermarking mechanism, in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection. Extensive experiments conducted across multiple datasets and language models demonstrate that DualGuard achieves excellent detectability, robustness, traceability, and text quality, effectively advancing the state of LLM watermarking for real-world applications.
Related papers
- AuthenLoRA: Entangling Stylization with Imperceptible Watermarks for Copyright-Secure LoRA Adapters [52.556959321030966]
Low-Rank Adaptation (LoRA) offers an efficient paradigm for customizing diffusion models.<n>Existing watermarking techniques either target base models or verify LoRA modules themselves.<n>We propose AuthenLoRA, a unified watermarking framework that embeds imperceptible, traceable watermarks directly into the LoRA training process.
arXiv Detail & Related papers (2025-11-26T09:48:11Z) - SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models [52.877452505561706]
We propose the first copyright evasion attack specifically designed to undermine dataset ownership verification (DOV)<n>Our CEAT2I comprises three stages: watermarked sample detection, trigger identification, and efficient watermark mitigation.<n>Our experiments show that our CEAT2I effectively evades DOV mechanisms while preserving model performance.
arXiv Detail & Related papers (2025-05-05T17:51:55Z) - Unified attacks to large language model watermarks: spoofing and scrubbing in unauthorized knowledge distillation [33.394877468499395]
We propose Contrastive Decoding-Guided Knowledge Distillation (CDG-KD) as a unified framework that enables bidirectional attacks under unauthorized knowledge distillation.<n>Our approach employs contrastive decoding to extract corrupted or amplified watermark texts via comparing outputs from the student model and weakly watermarked references.<n>Our findings underscore critical need for developing watermarking schemes that are robust and unforgeable.
arXiv Detail & Related papers (2025-04-24T12:15:46Z) - Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning [34.76886510334969]
A piggyback attack can maliciously alter the meaning of watermarked text-transforming it into hate speech-while preserving the original watermark.<n>We propose a semantic-aware watermarking algorithm that embeds watermarks into a given target text while preserving its original meaning.
arXiv Detail & Related papers (2025-04-09T04:38:17Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.<n> adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.<n> Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks.
One countermeasure is adversarial attack, deceiving unauthorized malicious FR.
We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z) - Dual Defense: Adversarial, Traceable, and Invisible Robust Watermarking
against Face Swapping [13.659927216999407]
Malicious applications of deep forgery, represented by face swapping, have introduced security threats such as misinformation dissemination and identity fraud.
We propose a novel active defense mechanism that combines traceability and adversariality, called Dual Defense.
It invisibly embeds a single robust watermark within the target face to actively respond to sudden cases of malicious face swapping.
arXiv Detail & Related papers (2023-10-25T10:39:51Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.