iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
- URL: http://arxiv.org/abs/2511.08905v1
- Date: Thu, 13 Nov 2025 01:16:15 GMT
- Title: iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
- Authors: Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang,
- Abstract summary: iSeal is a fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner.<n>It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy.<n>iSeal achieves 100 percent Fingerprint Success Rate on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.
- Score: 22.052342142871144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.
Related papers
- Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models [14.909356150499297]
We propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA)<n>The proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance.
arXiv Detail & Related papers (2026-01-07T06:06:56Z) - Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z) - SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z) - Fingerprinting LLMs via Prompt Injection [16.907123772391213]
Large language models (LLMs) are often modified after release through post-processing such as post-training or quantization.<n>Existing provenance detection methods have two main limitations: (1) they embed signals into the base model before release, which is infeasible for already published models, or (2) they compare outputs across models using hand-crafted or random prompts, which are not robust to post-processing.<n>We propose LLMPrint, a novel detection framework that constructs fingerprints by exploiting LLMs' inherent vulnerability to prompt injection.
arXiv Detail & Related papers (2025-09-29T19:54:36Z) - SoK: Large Language Model Copyright Auditing via Fingerprinting [69.14570598973195]
We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches.<n>We propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios.
arXiv Detail & Related papers (2025-08-27T12:56:57Z) - EditMF: Drawing an Invisible Fingerprint for Your Large Language Models [11.691985114214162]
EditMF is a training-free fingerprinting paradigm that achieves highly imperceptible fingerprint embedding with minimal computational overhead.<n>We show that EditMF combines high imperceptibility with negligible model's performance loss, while delivering robustness far beyond LoRA-based fingerprinting.
arXiv Detail & Related papers (2025-08-12T10:52:48Z) - Robust Anti-Backdoor Instruction Tuning in LVLMs [53.766434746801366]
We introduce a lightweight, certified-agnostic defense framework for large visual language models (LVLMs)<n>Our framework finetunes only adapter modules and text embedding layers under instruction tuning.<n>Experiments against seven attacks on Flickr30k and MSCOCO demonstrate that ours reduces their attack success rate to nearly zero.
arXiv Detail & Related papers (2025-06-04T01:23:35Z) - ImF: Implicit Fingerprint for Large Language Models [14.580290415247385]
We introduce a novel adversarial attack named Generation Revision Intervention (GRI) attack.<n>GRI exploits the semantic fragility of current fingerprinting methods, effectively erasing fingerprints.<n>We propose a novel model fingerprint paradigm called Implicit Fingerprints (ImF)
arXiv Detail & Related papers (2025-03-25T05:47:34Z) - Understanding and Enhancing the Transferability of Jailbreaking Attacks [12.446931518819875]
Jailbreaking attacks can effectively manipulate open-source large language models (LLMs) to produce harmful responses.<n>This work investigates the transferability of jailbreaking attacks by analysing their impact on the model's intent perception.<n>We propose the Perceived-importance Flatten (PiF) method, which uniformly disperses the model's focus across neutral-intent tokens in the original input.
arXiv Detail & Related papers (2025-02-05T10:29:54Z) - Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning.
Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model.
It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.