dgMARK: Decoding-Guided Watermarking for Diffusion Language Models
- URL: http://arxiv.org/abs/2601.22985v1
- Date: Fri, 30 Jan 2026 13:51:20 GMT
- Title: dgMARK: Decoding-Guided Watermarking for Diffusion Language Models
- Authors: Pyo Min Hong, Albert No,
- Abstract summary: dgMARK is a decoding-guided watermarking method for discrete diffusion language models.<n>dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint.<n> Watermarks are detected via elevated parity-matching statistics.
- Score: 5.43345665278304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose dgMARK, a decoding-guided watermarking method for discrete diffusion language models (dLLMs). Unlike autoregressive models, dLLMs can generate tokens in arbitrary order. While an ideal conditional predictor would be invariant to this order, practical dLLMs exhibit strong sensitivity to the unmasking order, creating a new channel for watermarking. dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint induced by a binary hash, without explicitly reweighting the model's learned probabilities. The method is plug-and-play with common decoding strategies (e.g., confidence, entropy, and margin-based ordering) and can be strengthened with a one-step lookahead variant. Watermarks are detected via elevated parity-matching statistics, and a sliding-window detector ensures robustness under post-editing operations including insertion, deletion, substitution, and paraphrasing.
Related papers
- LR-DWM: Efficient Watermarking for Diffusion Language Models [40.70709965738489]
Diffusion Language Models (DLMs) generate text via non-sequential iterative denoising.<n>Recent work proposed to watermark DLMs by inverting the process when needed, but suffers significant computational or memory overhead.<n>We introduce Left-Right Diffusion Watermarking (LR-DWM), a scheme that biases the generated token based on both left and right neighbors.
arXiv Detail & Related papers (2026-01-18T12:08:51Z) - DMark: Order-Agnostic Watermarking for Diffusion Large Language Models [46.07844536066178]
Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality.<n>We present DMark, the first watermarking framework designed specifically for dLLMs.
arXiv Detail & Related papers (2025-10-03T11:14:16Z) - StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models [4.76514657698929]
StealthInk is a stealthy multi-bit watermarking scheme for large language models (LLMs)<n>It preserves the original text distribution while enabling the embedding of provenance data.<n>We derive a lower bound on the number of tokens necessary for watermark detection at a fixed equal error rate.
arXiv Detail & Related papers (2025-06-05T18:37:38Z) - Accelerating Diffusion LLMs via Adaptive Parallel Decoding [60.407727995313074]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z) - SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models [4.069844339028727]
SimMark is a robust sentence-level watermarking algorithm for large language models (LLMs)<n>It embeds detectable statistical patterns imperceptible to humans, and employs a soft counting mechanism.<n>We show that SimMark sets a new benchmark for robust watermarking of LLM-generated content.
arXiv Detail & Related papers (2025-02-05T00:21:01Z) - A Watermark for Order-Agnostic Language Models [55.89285889529492]
Pattern-mark is a pattern-based watermarking framework specifically designed for order-agnostic LMs.
We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns.
Our evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness.
arXiv Detail & Related papers (2024-10-17T17:41:28Z) - Edit Distance Robust Watermarks via Indexing Pseudorandom Codes [29.69428894587431]
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees.<n>We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions.
arXiv Detail & Related papers (2024-06-04T04:03:17Z) - Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation.
Watermarking is pivotal in this context, which involves embedding hidden markers in texts.
We introduce a novel multi-objective optimization (MOO) approach for watermarking.
Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - When Counting Meets HMER: Counting-Aware Network for Handwritten
Mathematical Expression Recognition [57.51793420986745]
We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN)
We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations.
Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
arXiv Detail & Related papers (2022-07-23T08:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.