Related papers: MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

URL: http://arxiv.org/abs/2602.14030v1
Date: Sun, 15 Feb 2026 07:29:06 GMT
Title: MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages
Authors: Xuehao Cui, Ruibo Chen, Yihan Wu, Heng Huang,
Abstract summary: Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages.<n>We propose MC$2$Mark, a distortion-free multi-bit watermarking framework for reliable embedding and decoding of long messages.
Score: 62.982950935139534
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models now produce text indistinguishable from human writing, which increases the need for reliable provenance tracing. Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages. We propose MC$^2$Mark, a distortion-free multi-bit watermarking framework designed for reliable embedding and decoding of long messages. Our key technical idea is Multi-Channel Colored Reweighting, which encodes bits through structured token reweighting while keeping the token distribution unbiased, together with Multi-Layer Sequential Reweighting to strengthen the watermark signal and an evidence-accumulation detector for message recovery. Experiments show that MC$^2$Mark improves detectability and robustness over prior multi-bit watermarking methods while preserving generation quality, achieving near-perfect accuracy for short messages and exceeding the second-best method by nearly 30% for long messages.

Related papers

MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models [5.735801967350819]
We propose MirrorMark, a distortion-free watermark for large language models (LLMs)<n>MirrorMark embeds multi-bit messages without altering the token probability distribution, preserving text quality by design.<n> Experiments show that MirrorMark matches the text quality of non-watermarked generation while achieving substantially stronger detectability.
arXiv Detail & Related papers (2026-01-29T19:10:48Z)
Majority Bit-Aware Watermarking For Large Language Models [7.200910949076064]
MajorMark is a novel watermarking method that improves this trade-off through majority bit-aware encoding.<n>In contrast to prior methods that rely on token frequency analysis for decoding, MajorMark employs a clustering-based decoding strategy.<n>Extensive experiments on state-of-the-art LLMs demonstrate that our methods significantly enhance both decoding accuracy and text generation quality.
arXiv Detail & Related papers (2025-08-05T18:19:00Z)
BiMark: Unbiased Multilayer Watermarking for Large Language Models [68.64050157343334]
We propose BiMark, a novel watermarking framework that balances text quality preservation and message embedding capacity.<n>BiMark achieves up to 30% higher extraction rates for short texts while maintaining text quality indicated by lower perplexity.
arXiv Detail & Related papers (2025-06-19T11:08:59Z)
Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks.<n>MCmark preserves the original distribution of the language model.<n>It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z)
DERMARK: A Dynamic, Efficient and Robust Multi-bit Watermark for Large Language Models [18.023143082876015]
We propose a dynamic, efficient, and robust multi-bit watermarking method that divides the text into variable-length segments for each watermark bit.<n>Our method reduces the number of tokens required per embedded bit by 25%, reduces watermark embedding time by 50%, and maintains high robustness against text modifications and watermark erasure attacks.
arXiv Detail & Related papers (2025-02-04T11:23:49Z)
Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z)
Advancing Beyond Identification: Multi-bit Watermark for Large Language Models [31.066140913513035]
We show the viability of tackling misuses of large language models beyond the identification of machine-generated text. We propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation.
arXiv Detail & Related papers (2023-08-01T01:27:40Z)
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism. Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs. We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.