DMark: Order-Agnostic Watermarking for Diffusion Large Language Models
- URL: http://arxiv.org/abs/2510.02902v1
- Date: Fri, 03 Oct 2025 11:14:16 GMT
- Title: DMark: Order-Agnostic Watermarking for Diffusion Large Language Models
- Authors: Linyu Wu, Linhao Zhong, Wenjie Qu, Yuexin Li, Yue Liu, Shengfang Zhai, Chunhua Shen, Jiaheng Zhang,
- Abstract summary: Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality.<n>We present DMark, the first watermarking framework designed specifically for dLLMs.
- Score: 46.07844536066178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality, but existing watermarking methods fail on them due to their non-sequential decoding. Unlike autoregressive models that generate tokens left-to-right, dLLMs can finalize tokens in arbitrary order, breaking the causal design underlying traditional watermarks. We present DMark, the first watermarking framework designed specifically for dLLMs. DMark introduces three complementary strategies to restore watermark detectability: predictive watermarking uses model-predicted tokens when actual context is unavailable; bidirectional watermarking exploits both forward and backward dependencies unique to diffusion decoding; and predictive-bidirectional watermarking combines both approaches to maximize detection strength. Experiments across multiple dLLMs show that DMark achieves 92.0-99.5% detection rates at 1% false positive rate while maintaining text quality, compared to only 49.6-71.2% for naive adaptations of existing methods. DMark also demonstrates robustness against text manipulations, establishing that effective watermarking is feasible for non-autoregressive language models.
Related papers
- dgMARK: Decoding-Guided Watermarking for Diffusion Language Models [5.43345665278304]
dgMARK is a decoding-guided watermarking method for discrete diffusion language models.<n>dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint.<n> Watermarks are detected via elevated parity-matching statistics.
arXiv Detail & Related papers (2026-01-30T13:51:20Z) - LR-DWM: Efficient Watermarking for Diffusion Language Models [40.70709965738489]
Diffusion Language Models (DLMs) generate text via non-sequential iterative denoising.<n>Recent work proposed to watermark DLMs by inverting the process when needed, but suffers significant computational or memory overhead.<n>We introduce Left-Right Diffusion Watermarking (LR-DWM), a scheme that biases the generated token based on both left and right neighbors.
arXiv Detail & Related papers (2026-01-18T12:08:51Z) - T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models [89.29541056113442]
T2SMark is a two-stage watermarking scheme based on Tail-Truncated Sampling (TTS)<n>We evaluate T2SMark on diffusion models with both U-Net and DiT backbones.
arXiv Detail & Related papers (2025-10-25T16:55:55Z) - Watermarking Diffusion Language Models [9.515480957792542]
We introduce the first watermark tailored for diffusion language models (DLMs)<n>This is an emergent LLM paradigm able to generate tokens in arbitrary order, in contrast to standard autoregressive language models (ARLMs) which generate tokens sequentially.
arXiv Detail & Related papers (2025-09-29T07:11:40Z) - Towards Robust Red-Green Watermarking for Autoregressive Image Generators [17.784976310663104]
In this paper, we explore the use of in-generation watermarks in autoregressive (AR) image models.<n>AR models generate images by autoregressively predicting a sequence of visual tokens that are then decoded into pixels.<n>Inspired by red-green watermarks for large language models, we examine token-level watermarking schemes that bias the next-token prediction.<n>We propose two novel watermarking methods that rely on visual token clustering to assign similar tokens to the same set.
arXiv Detail & Related papers (2025-08-08T19:14:22Z) - Training-Free Watermarking for Autoregressive Image Generation [24.86897985016275]
IndexMark is a training-free watermarking framework for autoregressive image generation models.<n>We show IndexMark achieves state-of-the-art performance in terms of image quality and verification accuracy.
arXiv Detail & Related papers (2025-05-20T17:58:02Z) - Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks.<n>MCmark preserves the original distribution of the language model.<n>It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z) - TokenMark: A Modality-Agnostic Watermark for Pre-trained Transformers [67.57928750537185]
TokenMark is a robust, modality-agnostic, robust watermarking system for pre-trained models.<n>It embeds the watermark by fine-tuning the pre-trained model on a set of specifically permuted data samples.<n>It significantly improves the robustness, efficiency, and universality of model watermarking.
arXiv Detail & Related papers (2024-03-09T08:54:52Z) - ClearMark: Intuitive and Robust Model Watermarking via Transposed Model
Training [50.77001916246691]
This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment.
ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds.
It shows an 8,544-bit watermark capacity comparable to the strongest existing work.
arXiv Detail & Related papers (2023-10-25T08:16:55Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.