Related papers: EditMark: Watermarking Large Language Models based on Model Editing

EditMark: Watermarking Large Language Models based on Model Editing

URL: http://arxiv.org/abs/2510.16367v1
Date: Sat, 18 Oct 2025 06:25:17 GMT
Title: EditMark: Watermarking Large Language Models based on Model Editing
Authors: Shuai Li, Kejiang Chen, Jun Jiang, Jie Zhang, Qiyi Yao, Kai Zeng, Weiming Zhang, Nenghai Yu,
Abstract summary: We propose EditMark, the first watermarking method that leverages model editing to embed a training-free, stealthy, and performance-lossless watermark.<n>Experiments indicate that EditMark can embed 32-bit watermarks into LLMs within 20 seconds with a watermark extraction success rate of 100%.
Score: 76.04893766374221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities, but their training requires extensive data and computational resources, rendering them valuable digital assets. Therefore, it is essential to watermark LLMs to protect their copyright and trace unauthorized use or resale. Existing methods for watermarking LLMs primarily rely on training LLMs with a watermarked dataset, which entails burdensome training costs and negatively impacts the LLM's performance. In addition, their watermarked texts are not logical or natural, thereby reducing the stealthiness of the watermark. To address these issues, we propose EditMark, the first watermarking method that leverages model editing to embed a training-free, stealthy, and performance-lossless watermark for LLMs. We observe that some questions have multiple correct answers. Therefore, we assign each answer a unique watermark and update the weights of LLMs to generate corresponding questions and answers through the model editing technique. In addition, we refine the model editing technique to align with the requirements of watermark embedding. Specifically, we introduce an adaptive multi-round stable editing strategy, coupled with the injection of a noise matrix, to improve both the effectiveness and robustness of the watermark embedding. Extensive experiments indicate that EditMark can embed 32-bit watermarks into LLMs within 20 seconds (Fine-tuning: 6875 seconds) with a watermark extraction success rate of 100%, which demonstrates its effectiveness and efficiency. External experiments further demonstrate that EditMark has fidelity, stealthiness, and a certain degree of robustness against common attacks.

Related papers

Breaking Distortion-free Watermarks in Large Language Models [17.56485872604935]
There are growing concerns that the current LLM watermarking schemes are vulnerable to expert adversaries.<n>We propose using adaptive prompting and a sorting-based algorithm to accurately recover the underlying secret key for watermarking the LLM.
arXiv Detail & Related papers (2025-02-25T19:52:55Z)
Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs)<n>We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts.<n> Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z)
New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking [28.53032132891346]
We introduce two new easy-to-use methods for evaluating watermarking algorithms for large-language models (LLMs) Our experiments, conducted across various datasets, reveal that current watermarking methods are detectable by even simple classifiers. Our findings underscore the trade-off between watermark robustness and text quality and highlight the importance of having more informative metrics to assess watermarking quality.
arXiv Detail & Related papers (2023-12-04T22:56:31Z)
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection. In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM. Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z)
ClearMark: Intuitive and Robust Model Watermarking via Transposed Model Training [50.77001916246691]
This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment. ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds. It shows an 8,544-bit watermark capacity comparable to the strongest existing work.
arXiv Detail & Related papers (2023-10-25T08:16:55Z)
Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs. It is possible to integrate watermarks without affecting the output probability distribution. The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z)
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism. Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs. We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z)
Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.