AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing
Abstract Overview
The paper studies why existing sentence-level watermarks remain brittle under paraphrasing despite encoding signals in semantics rather than surface tokens. It argues that prefix-based designs are especially vulnerable to structural perturbations such as sentence splitting and merging, because changes to one sentence can disrupt watermark detection for subsequent sentences. To address this, the authors propose AliMark, which reformulates sentence-level watermarking as bit-sequence encoding during generation and sequence alignment during detection. The detector combines a Re-Structurer that generates alternative sentence segmentations with Adaptive Bit Sequence Alignment based on a block-level edit metric, and the method is evaluated on Booksum and C4 with OPT-1.3B and Qwen3-1.7B under multiple paraphrasing attacks.
Novelty
The main novelty is the reframing of sentence-level watermarking from a prefix-conditioned detection problem into a global bit-sequence encoding and alignment problem. AliMark also introduces a two-part detection design—candidate text restructuring plus adaptive alignment with a secret bit sequence using Block Edit Rate—to explicitly handle sentence merges, splits, insertions, and deletions.
Results
Across Booksum and C4, AliMark consistently achieves the strongest or near-strongest detection performance, with the largest gains under stronger paraphrasers such as DIPPER and GPT-3.5. For example, with OPT-1.3B on Booksum, AliMark reaches TPR@5% of 61.6% under DIPPER and 66.6% under GPT-3.5, while the other sentence-level baselines reported in the table stay at or below 30.4% and 33.0%, respectively. The paper also reports stronger robustness under controlled insertion, deletion, and reordering perturbations, while maintaining perplexity distributions comparable to unwatermarked generation.
Key Points
- The paper identifies structural perturbations from paraphrasing—especially sentence splitting and merging—as a central failure mode for prefix-based sentence-level watermarking.
- AliMark embeds watermark information as per-sentence bit blocks and detects it using text restructuring and adaptive block-level sequence alignment to a secret bit sequence.
- Empirical results show the method is markedly more robust than prior baselines under strong paraphrasing attacks and controlled structural perturbations, with little reported text-quality degradation.