Towards Generalized and Stealthy Watermarking for Generative Code Models
- URL: http://arxiv.org/abs/2506.20926v2
- Date: Sun, 29 Jun 2025 13:42:05 GMT
- Title: Towards Generalized and Stealthy Watermarking for Generative Code Models
- Authors: Haoxuan Li, Jiale Zhang, Xiaobing Sun, Xiapu Luo,
- Abstract summary: Experimental results demonstrate that CodeGuard achieves up to 100% watermark verification rates in both code summarization and code generation tasks.<n>In terms of stealthiness, CodeGuard performs exceptionally, with a maximum detection rate of only 0.078 against ONION detection methods.
- Score: 35.78974773421725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative code models (GCMs) significantly enhance development efficiency through automated code generation and code summarization. However, building and training these models require computational resources and time, necessitating effective digital copyright protection to prevent unauthorized leaks and misuse. Backdoor watermarking, by embedding hidden identifiers, simplifies copyright verification by breaking the model's black-box nature. Current backdoor watermarking techniques face two main challenges: first, limited generalization across different tasks and datasets, causing fluctuating verification rates; second, insufficient stealthiness, as watermarks are easily detected and removed by automated methods. To address these issues, we propose CodeGuard, a novel watermarking method combining attention mechanisms with distributed trigger embedding strategies. Specifically, CodeGuard employs attention mechanisms to identify watermark embedding positions, ensuring verifiability. Moreover, by using homomorphic character replacement, it avoids manual detection, while distributed trigger embedding reduces the likelihood of automated detection. Experimental results demonstrate that CodeGuard achieves up to 100% watermark verification rates in both code summarization and code generation tasks, with no impact on the primary task performance. In terms of stealthiness, CodeGuard performs exceptionally, with a maximum detection rate of only 0.078 against ONION detection methods, significantly lower than baseline methods.
Related papers
- Disappearing Ink: Obfuscation Breaks N-gram Code Watermarks in Theory and Practice [23.788321123219244]
Distinguishing AI-generated code from human-written code is crucial for authorship attribution, content tracking, and misuse detection.<n>N-gram-based watermarking schemes have emerged as prominent, which inject secret watermarks to be detected during the generation.<n>Most claims rely solely on defenses against simple code transformations or code optimizations as a simulation of attack, creating a questionable sense of robustness.
arXiv Detail & Related papers (2025-07-07T22:18:19Z) - TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity [68.95168727940973]
Tamper-Aware Generative image WaterMarking method named TAG-WM.<n>This paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM.
arXiv Detail & Related papers (2025-06-30T03:14:07Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign [15.153228808457628]
RoSeMary regulates LLM-generated code to avoid intellectual property rights violations and inappropriate misuse in software development.<n>High-quality watermarks adhering to the detectability-fidelity-robustness tri-objective are limited due to codes' low-entropy nature.<n>RoSeMary achieves high detection accuracy while preserving the code functionality.
arXiv Detail & Related papers (2025-02-04T07:35:28Z) - Beyond Dataset Watermarking: Model-Level Copyright Protection for Code Summarization Models [37.817691840557984]
CSMs face risks of exploitation by unauthorized users.<n>Traditional watermarking methods require separate design of triggers and watermark features.<n>We propose ModMark, a novel model-level digital watermark embedding method.
arXiv Detail & Related papers (2024-10-18T00:48:00Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks.
One countermeasure is adversarial attack, deceiving unauthorized malicious FR.
We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z) - Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples.
By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z) - Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources.
We propose a safe and robust backdoor-based watermark injection technique.
We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained
Encoders [9.070481370120905]
We propose SSLGuard, the first watermarking algorithm for pre-trained encoders.
SSLGuard is effective in watermark injection and verification, and is robust against model stealing and other watermark removal attacks.
arXiv Detail & Related papers (2022-01-27T17:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.