Is The Watermarking Of LLM-Generated Code Robust?
- URL: http://arxiv.org/abs/2403.17983v3
- Date: Sun, 16 Feb 2025 22:31:00 GMT
- Title: Is The Watermarking Of LLM-Generated Code Robust?
- Authors: Tarun Suresh, Shubham Ugare, Gagandeep Singh, Sasa Misailovic,
- Abstract summary: We show that watermarking techniques are significantly more fragile in code-based contexts.<n>Specifically, we show that simple semantic-preserving transformations, such as variable renaming and dead code insertion, can effectively erase watermarks.
- Score: 5.48277165801539
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first in depth study on the robustness of existing watermarking techniques applied to code generated by large language models (LLMs). As LLMs increasingly contribute to software development, watermarking has emerged as a potential solution for detecting AI generated code and mitigating misuse, such as plagiarism or the automated generation of malicious programs. While previous research has demonstrated the resilience of watermarking in the text setting, our work reveals that watermarking techniques are significantly more fragile in code-based contexts. Specifically, we show that simple semantic-preserving transformations, such as variable renaming and dead code insertion, can effectively erase watermarks without altering the program's functionality. To systematically evaluate watermark robustness, we develop an algorithm that traverses the Abstract Syntax Tree (AST) of a watermarked program and applies a sequence of randomized, semantics-preserving transformations. Our experimental results, conducted on Python code generated by different LLMs, indicate that even minor modifications can drastically reduce watermark detectability, with true positive rates (TPR) dropping below 50% in many cases. Our code is publicly available at https://github.com/uiuc-arc/llm-code-watermark.
Related papers
- Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code [4.608510640547953]
Code watermarking identifies AI-generated code by embedding patterns into the code during generation.
Existing methods often modify tokens that are critical for program logic, such as keywords in conditional expressions.
We present STONE, a method that preserves functional integrity by selectively inserting watermarks only into non-syntax tokens.
arXiv Detail & Related papers (2025-02-26T05:46:13Z) - Revisiting the Robustness of Watermarking to Paraphrasing Attacks [10.68370011459729]
Many recent watermarking techniques modify the output probabilities of LMs to embed a signal in the generated output that can later be detected.
We show that with access to only a limited number of generations from a black-box watermarked model, we can drastically increase the effectiveness of paraphrasing attacks to evade watermark detection.
arXiv Detail & Related papers (2024-11-08T02:22:30Z) - Beyond Dataset Watermarking: Model-Level Copyright Protection for Code Summarization Models [37.817691840557984]
CSMs face risks of exploitation by unauthorized users.
Traditional watermarking methods require separate design of triggers and watermark features.
We propose ModMark, a novel model-level digital watermark embedding method.
arXiv Detail & Related papers (2024-10-18T00:48:00Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.
Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models [48.409979469683975]
We introduce the concept of cross-lingual consistency in text watermarking.
Preliminary empirical results reveal that current text watermarking technologies lack consistency when texts are translated into various languages.
We propose a Cross-lingual Watermark Removal Attack (CWRA) to bypass watermarking.
arXiv Detail & Related papers (2024-02-21T18:48:38Z) - On the Learnability of Watermarks for Language Models [80.97358663708592]
We ask whether language models can directly learn to generate watermarked text.
We propose watermark distillation, which trains a student model to behave like a teacher model.
We find that models can learn to generate watermarked text with high detectability.
arXiv Detail & Related papers (2023-12-07T17:41:44Z) - A Robust Semantics-based Watermark for Large Language Model against Paraphrasing [50.84892876636013]
Large language models (LLMs) have show great ability in various natural language tasks.
There are concerns that LLMs are possible to be used improperly or even illegally.
We propose a semantics-based watermark framework SemaMark.
arXiv Detail & Related papers (2023-11-15T06:19:02Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - Towards Tracing Code Provenance with Code Watermarking [37.41260851333952]
We propose CodeMark, a watermarking system that hides bit strings into variables respecting the natural and operational semantics of the code.
For naturalness, we introduce a contextual watermarking scheme to generate watermarked variables more coherent in the context atop graph neural networks.
We show CodeMark outperforms the SOTA watermarking systems with a better balance of the watermarking requirements.
arXiv Detail & Related papers (2023-05-21T13:53:12Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.