Related papers: SEAL: Subspace-Anchored Watermarks for LLM Ownership

SEAL: Subspace-Anchored Watermarks for LLM Ownership

URL: http://arxiv.org/abs/2511.11356v1
Date: Fri, 14 Nov 2025 14:44:11 GMT
Title: SEAL: Subspace-Anchored Watermarks for LLM Ownership
Authors: Yanbo Dai, Zongjie Li, Zhenlan Ji, Shuai Wang,
Abstract summary: We propose SEAL, a subspace-anchored watermarking framework for large language models.<n> SEAL embeds multi-bit signatures directly into the model's latent representational space, supporting both white-box and black-box verification scenarios.<n>We conduct comprehensive experiments on multiple benchmark datasets and six prominent LLMs to demonstrate SEAL's superior effectiveness, fidelity, efficiency, and robustness.
Score: 12.022506016268112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level performance in text generation, reasoning, and question answering. However, training such models requires substantial computational resources, large curated datasets, and sophisticated alignment procedures. As a result, they constitute highly valuable intellectual property (IP) assets that warrant robust protection mechanisms. Existing IP protection approaches suffer from critical limitations. Model fingerprinting techniques can identify model architectures but fail to establish ownership of specific model instances. In contrast, traditional backdoor-based watermarking methods embed behavioral anomalies that can be easily removed through common post-processing operations such as fine-tuning or knowledge distillation. We propose SEAL, a subspace-anchored watermarking framework that embeds multi-bit signatures directly into the model's latent representational space, supporting both white-box and black-box verification scenarios. Our approach leverages model editing techniques to align the hidden representations of selected anchor samples with predefined orthogonal bit vectors. This alignment embeds the watermark while preserving the model's original factual predictions, rendering the watermark functionally harmless and stealthy. We conduct comprehensive experiments on multiple benchmark datasets and six prominent LLMs, comparing SEAL with 11 existing fingerprinting and watermarking methods to demonstrate its superior effectiveness, fidelity, efficiency, and robustness. Furthermore, we evaluate SEAL under potential knowledgeable attacks and show that it maintains strong verification performance even when adversaries possess knowledge of the watermarking mechanism and the embedded signatures.

Related papers

On Protecting Agentic Systems' Intellectual Property via Watermarking [17.334130453604313]
AGENTWM is the first watermarking framework designed specifically for agentic models.<n>AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths.<n>Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries.
arXiv Detail & Related papers (2026-02-09T09:02:15Z)
AuthenLoRA: Entangling Stylization with Imperceptible Watermarks for Copyright-Secure LoRA Adapters [52.556959321030966]
Low-Rank Adaptation (LoRA) offers an efficient paradigm for customizing diffusion models.<n>Existing watermarking techniques either target base models or verify LoRA modules themselves.<n>We propose AuthenLoRA, a unified watermarking framework that embeds imperceptible, traceable watermarks directly into the LoRA training process.
arXiv Detail & Related papers (2025-11-26T09:48:11Z)
SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z)
Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z)
Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution [14.60627694687767]
We propose Hot-Swap MarkBoard, an efficient watermarking method.<n>It encodes user-specific $n$-bit binary signatures by independently embedding multiple watermarks.<n>The method supports black-box verification and is compatible with various model architectures.
arXiv Detail & Related papers (2025-07-28T09:14:21Z)
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.<n> adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.<n> Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z)
Probabilistically Robust Watermarking of Neural Networks [4.332441337407564]
We introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks. Our approach does not require additional model training and can be applied to any model architecture.
arXiv Detail & Related papers (2024-01-16T10:32:13Z)
Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack. We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z)
Model Watermarking for Image Processing Networks [120.918532981871]
How to protect the intellectual property of deep models is a very important but seriously under-researched problem. We propose the first model watermarking framework for protecting image processing models.
arXiv Detail & Related papers (2020-02-25T18:36:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.