MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages
- URL: http://arxiv.org/abs/2602.14030v1
- Date: Sun, 15 Feb 2026 07:29:06 GMT
- Title: MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages
- Authors: Xuehao Cui, Ruibo Chen, Yihan Wu, Heng Huang,
- Abstract summary: Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages.<n>We propose MC$2$Mark, a distortion-free multi-bit watermarking framework for reliable embedding and decoding of long messages.
- Score: 62.982950935139534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models now produce text indistinguishable from human writing, which increases the need for reliable provenance tracing. Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages. We propose MC$^2$Mark, a distortion-free multi-bit watermarking framework designed for reliable embedding and decoding of long messages. Our key technical idea is Multi-Channel Colored Reweighting, which encodes bits through structured token reweighting while keeping the token distribution unbiased, together with Multi-Layer Sequential Reweighting to strengthen the watermark signal and an evidence-accumulation detector for message recovery. Experiments show that MC$^2$Mark improves detectability and robustness over prior multi-bit watermarking methods while preserving generation quality, achieving near-perfect accuracy for short messages and exceeding the second-best method by nearly 30% for long messages.
Related papers
- MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models [5.735801967350819]
We propose MirrorMark, a distortion-free watermark for large language models (LLMs)<n>MirrorMark embeds multi-bit messages without altering the token probability distribution, preserving text quality by design.<n> Experiments show that MirrorMark matches the text quality of non-watermarked generation while achieving substantially stronger detectability.
arXiv Detail & Related papers (2026-01-29T19:10:48Z) - Majority Bit-Aware Watermarking For Large Language Models [7.200910949076064]
MajorMark is a novel watermarking method that improves this trade-off through majority bit-aware encoding.<n>In contrast to prior methods that rely on token frequency analysis for decoding, MajorMark employs a clustering-based decoding strategy.<n>Extensive experiments on state-of-the-art LLMs demonstrate that our methods significantly enhance both decoding accuracy and text generation quality.
arXiv Detail & Related papers (2025-08-05T18:19:00Z) - BiMark: Unbiased Multilayer Watermarking for Large Language Models [68.64050157343334]
We propose BiMark, a novel watermarking framework that balances text quality preservation and message embedding capacity.<n>BiMark achieves up to 30% higher extraction rates for short texts while maintaining text quality indicated by lower perplexity.
arXiv Detail & Related papers (2025-06-19T11:08:59Z) - Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks.<n>MCmark preserves the original distribution of the language model.<n>It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z) - DERMARK: A Dynamic, Efficient and Robust Multi-bit Watermark for Large Language Models [18.023143082876015]
We propose a dynamic, efficient, and robust multi-bit watermarking method that divides the text into variable-length segments for each watermark bit.<n>Our method reduces the number of tokens required per embedded bit by 25%, reduces watermark embedding time by 50%, and maintains high robustness against text modifications and watermark erasure attacks.
arXiv Detail & Related papers (2025-02-04T11:23:49Z) - Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes.
We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z) - Advancing Beyond Identification: Multi-bit Watermark for Large Language Models [31.066140913513035]
We show the viability of tackling misuses of large language models beyond the identification of machine-generated text.
We propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation.
arXiv Detail & Related papers (2023-08-01T01:27:40Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.