VLA-Mark: A cross modal watermark for large vision-language alignment model
- URL: http://arxiv.org/abs/2507.14067v2
- Date: Fri, 19 Sep 2025 06:54:08 GMT
- Title: VLA-Mark: A cross modal watermark for large vision-language alignment model
- Authors: Shuliang Liu, Qi Zheng, Jesse Jiaxi Xu, Yibo Yan, Junyan Zhang, He Geng, Aiwei Liu, Peijie Jiang, Jia Liu, Yik-Cheung Tam, Xuming Hu,
- Abstract summary: VLA-Mark is a vision-aligned framework that embeds detectable watermarks while preserving semantic fidelity through cross-modal coordination.<n>Our approach integrates multiscale visual-textual alignment metrics, combining localized patch affinity, global semantic coherence, and contextual attention patterns.<n>Experiments show 7.4% lower PPL and 26.6% higher BLEU than conventional methods, with near-perfect detection.
- Score: 44.59029116115437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language models demand watermarking solutions that protect intellectual property without compromising multimodal coherence. Existing text watermarking methods disrupt visual-textual alignment through biased token selection and static strategies, leaving semantic-critical concepts vulnerable. We propose VLA-Mark, a vision-aligned framework that embeds detectable watermarks while preserving semantic fidelity through cross-modal coordination. Our approach integrates multiscale visual-textual alignment metrics, combining localized patch affinity, global semantic coherence, and contextual attention patterns, to guide watermark injection without model retraining. An entropy-sensitive mechanism dynamically balances watermark strength and semantic preservation, prioritizing visual grounding during low-uncertainty generation phases. Experiments show 7.4% lower PPL and 26.6% higher BLEU than conventional methods, with near-perfect detection (98.8% AUC). The framework demonstrates 96.1\% attack resilience against attacks such as paraphrasing and synonym substitution, while maintaining text-visual consistency, establishing new standards for quality-preserving multimodal watermarking
Related papers
- AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models [28.393476667026523]
Vision-agnostic watermarks may introduce visually irrelevant tokens and disrupt visual grounding.<n>We propose Attention-Guided Dynamic Watermarking (AGMark)<n>AGMark embeds detectable signals while strictly preserving visual fidelity.
arXiv Detail & Related papers (2026-02-10T10:02:29Z) - WMVLM: Evaluating Diffusion Model Image Watermarking via Vision-Language Models [79.32764976020435]
Digital watermarking is essential for securing generated images from diffusion models.<n>Previous watermark evaluation methods lack a unified framework for both residual and semantic watermarks.<n>We proposeLM, the first unified and interpretable evaluation framework for diffusion model image watermarking via vision-language models.
arXiv Detail & Related papers (2026-01-29T12:14:32Z) - A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model [48.79816664229285]
VIsual Semantic Adaptive Watermark (VISA-Mark) is a novel framework that embeds detectable signals while strictly preserving visual fidelity.<n>Our approach employs a lightweight, efficiently trained prefix-tuner to extract dynamic Visual-Evidence Weights.<n> Empirical results confirm that VISA-Mark outperforms conventional methods with a 7.8% improvement in visual consistency.
arXiv Detail & Related papers (2026-01-12T07:55:13Z) - From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection [24.55335024940469]
Embeddings-as-a-Service (E) has emerged as a successful commercial paradigm on the web platform.<n>Prior studies have revealed that E is vulnerable to imitation attacks.<n>We propose SemMark, a novel semantic-based watermarking paradigm for E copyright protection.
arXiv Detail & Related papers (2025-12-18T11:50:38Z) - An Ensemble Framework for Unbiased Language Model Watermarking [60.99969104552168]
We propose ENS, a novel ensemble framework that enhances the detectability and robustness of unbiased watermarks.<n>ENS sequentially composes multiple independent watermark instances, each governed by a distinct key, to amplify the watermark signal.<n> Empirical evaluations show that ENS substantially reduces the number of tokens needed for reliable detection and increases resistance to smoothing and paraphrasing attacks.
arXiv Detail & Related papers (2025-09-28T19:37:44Z) - StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models [55.05404953041403]
We propose a novel framework that seamlessly integrates a binary watermark into the diffusion generation process.<n>We show that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.
arXiv Detail & Related papers (2025-09-22T16:35:19Z) - OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization [66.69924980864053]
We propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process.<n> OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations.<n> Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.
arXiv Detail & Related papers (2025-08-29T15:50:59Z) - IConMark: Robust Interpretable Concept-Based Watermark For AI Images [50.045011844765185]
We propose IConMark, a novel in-generation robust semantic watermarking method.<n>IConMark embeds interpretable concepts into AI-generated images, making it resilient to adversarial manipulation.<n>We demonstrate its superiority in terms of detection accuracy and maintaining image quality.
arXiv Detail & Related papers (2025-07-17T05:38:30Z) - BiMark: Unbiased Multilayer Watermarking for Large Language Models [54.58546293741373]
We propose BiMark, a novel watermarking framework that balances text quality preservation and message embedding capacity.<n>BiMark achieves up to 30% higher extraction rates for short texts while maintaining text quality indicated by lower perplexity.
arXiv Detail & Related papers (2025-06-19T11:08:59Z) - Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation [58.85645136534301]
Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks.<n>We propose a novel watermarking scheme that improves both detectability and text quality by introducing a cumulative watermark entropy threshold.
arXiv Detail & Related papers (2025-04-16T14:16:38Z) - Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking [18.251123923955397]
Autoregressive learning has become a dominant approach for text-to-image generation, offering high efficiency and visual quality.<n>Existing watermarking methods, designed for diffusion models, often struggle to adapt to the sequential nature of VAR models.<n>We propose Safe- VAR, the first watermarking framework specifically designed for autoregressive text-to-image generation.
arXiv Detail & Related papers (2025-03-14T11:45:10Z) - Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach [35.319577498993354]
We present a novel theoretical framework for watermarking Large Language Models (LLMs)<n>Our approach focuses on maximizing detection performance while maintaining control over the worst-case Type-I error and text distortion.<n>We propose an efficient, model-agnostic, distribution-adaptive watermarking algorithm, utilizing a surrogate model alongside the Gumbel-max trick.
arXiv Detail & Related papers (2024-10-03T18:28:10Z) - Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation.
Watermarking is pivotal in this context, which involves embedding hidden markers in texts.
We introduce a novel multi-objective optimization (MOO) approach for watermarking.
Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z) - Adaptive Text Watermark for Large Language Models [8.100123266517299]
It is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model.
This paper proposes an adaptive watermarking strategy to address this problem.
arXiv Detail & Related papers (2024-01-25T03:57:12Z) - RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees [33.61946642460661]
This paper introduces a robust and agile watermark detection framework, dubbed as RAW.
We employ a classifier that is jointly trained with the watermark to detect the presence of the watermark.
We show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image.
arXiv Detail & Related papers (2024-01-23T22:00:49Z) - Cross-Attention Watermarking of Large Language Models [8.704964543257246]
New approach to linguistic watermarking of language models is presented.
Information is imperceptibly inserted into the output text while preserving its readability and original meaning.
Cross-attention mechanism is used to embed watermarks in the text during inference.
arXiv Detail & Related papers (2024-01-12T09:39:50Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z) - A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark.
Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking.
It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z) - T2IW: Joint Text to Image & Watermark Generation [74.20148555503127]
We introduce a novel task for the joint generation of text to image and watermark (T2IW)
This T2IW scheme ensures minimal damage to image quality when generating a compound image by forcing the semantic feature and the watermark signal to be compatible in pixels.
We demonstrate remarkable achievements in image quality, watermark invisibility, and watermark robustness, supported by our proposed set of evaluation metrics.
arXiv Detail & Related papers (2023-09-07T16:12:06Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.