SWaRL: Safeguard Code Watermarking via Reinforcement Learning
- URL: http://arxiv.org/abs/2601.02602v1
- Date: Mon, 05 Jan 2026 23:35:39 GMT
- Title: SWaRL: Safeguard Code Watermarking via Reinforcement Learning
- Authors: Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar,
- Abstract summary: We present SWaRL, a robust and fidelity-preserving watermarking framework.<n> SWaRL embeds unique and verifiable signatures in the generated output.<n>We show that SWaRL achieves higher watermark detection accuracy compared to prior methods.
- Score: 16.888582821315257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present SWaRL, a robust and fidelity-preserving watermarking framework designed to protect the intellectual property of code LLM owners by embedding unique and verifiable signatures in the generated output. Existing approaches rely on manually crafted transformation rules to preserve watermarked code functionality or manipulate token-generation probabilities at inference time, which are prone to compilation errors. To address these challenges, SWaRL employs a reinforcement learning-based co-training framework that uses compiler feedback for functional correctness and a jointly trained confidential verifier as a reward signal to maintain watermark detectability. Furthermore, SWaRL employs low-rank adaptation (LoRA) during fine-tuning, allowing the learned watermark information to be transferable across model updates. Extensive experiments show that SWaRL achieves higher watermark detection accuracy compared to prior methods while fully maintaining watermarked code functionality. The LoRA-based signature embedding steers the base model to generate and solve code in a watermark-specific manner without significant computational overhead. Moreover, SWaRL exhibits strong resilience against refactoring and adversarial transformation attacks.
Related papers
- AuthenLoRA: Entangling Stylization with Imperceptible Watermarks for Copyright-Secure LoRA Adapters [52.556959321030966]
Low-Rank Adaptation (LoRA) offers an efficient paradigm for customizing diffusion models.<n>Existing watermarking techniques either target base models or verify LoRA modules themselves.<n>We propose AuthenLoRA, a unified watermarking framework that embeds imperceptible, traceable watermarks directly into the LoRA training process.
arXiv Detail & Related papers (2025-11-26T09:48:11Z) - SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z) - StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models [55.05404953041403]
We propose a novel framework that seamlessly integrates a binary watermark into the diffusion generation process.<n>We show that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.
arXiv Detail & Related papers (2025-09-22T16:35:19Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - Optimizing Token Choice for Code Watermarking: An RL Approach [41.184827829989494]
We introduce CodeTracer, an adaptive code watermarking framework underpinned by a novel reinforcement learning paradigm.<n>CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction.<n>To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals.
arXiv Detail & Related papers (2025-08-16T06:11:29Z) - TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity [76.98973481600002]
This paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM.<n>The proposed method comprises four key modules: a dual-mark joint sampling (DMJS) algorithm for embedding copyright and localization watermarks into the latent space while preserving generative quality.<n>The experimental results demonstrate that TAG-WM achieves state-of-the-art performance in both tampering robustness and localization capability even under distortion.
arXiv Detail & Related papers (2025-06-30T03:14:07Z) - Towards Generalized and Stealthy Watermarking for Generative Code Models [35.78974773421725]
Experimental results demonstrate that CodeGuard achieves up to 100% watermark verification rates in both code summarization and code generation tasks.<n>In terms of stealthiness, CodeGuard performs exceptionally, with a maximum detection rate of only 0.078 against ONION detection methods.
arXiv Detail & Related papers (2025-06-26T01:14:35Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign [15.153228808457628]
RoSeMary regulates LLM-generated code to avoid intellectual property rights violations and inappropriate misuse in software development.<n>High-quality watermarks adhering to the detectability-fidelity-robustness tri-objective are limited due to codes' low-entropy nature.<n>RoSeMary achieves high detection accuracy while preserving the code functionality.
arXiv Detail & Related papers (2025-02-04T07:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.