Fingerprint Vector: Enabling Scalable and Efficient Model Fingerprint Transfer via Vector Addition
- URL: http://arxiv.org/abs/2409.08846v3
- Date: Tue, 26 Aug 2025 07:56:10 GMT
- Title: Fingerprint Vector: Enabling Scalable and Efficient Model Fingerprint Transfer via Vector Addition
- Authors: Zhenhua Xu, Qichen Liu, Zhebo Wang, Wenpeng Xing, Dezhang Kong, Mohan Li, Meng Han,
- Abstract summary: We propose a novel mechanism called the Fingerprint Vector.<n>It embeds a fingerprint into the base model via backdoor-based fine-tuning, then extracts a task-specific parameter delta as a fingerprint vector.<n>It achieves comparable or superior performance to direct injection across key desiderata.
- Score: 23.282821424581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor-based fingerprinting has emerged as an effective technique for tracing the ownership of large language models. However, in real-world deployment scenarios, developers often instantiate multiple downstream models from a shared base model, and applying fingerprinting to each variant individually incurs prohibitive computational overhead. While inheritance-based approaches -- where fingerprints are embedded into the base model and expected to persist through fine-tuning -- appear attractive, they suffer from three key limitations: late-stage fingerprinting, fingerprint instability, and interference with downstream adaptation. To address these challenges, we propose a novel mechanism called the Fingerprint Vector. Our method first embeds a fingerprint into the base model via backdoor-based fine-tuning, then extracts a task-specific parameter delta as a fingerprint vector by computing the difference between the fingerprinted and clean models. This vector can be directly added to any structurally compatible downstream model, allowing the fingerprint to be transferred post hoc without additional fine-tuning. Extensive experiments show that Fingerprint Vector achieves comparable or superior performance to direct injection across key desiderata. It maintains strong effectiveness across diverse model architectures as well as mainstream downstream variants within the same family. It also preserves harmlessness and robustness in most cases. Even when slight robustness degradation is observed, the impact remains within acceptable bounds and is outweighed by the scalability benefits of our approach.
Related papers
- A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors [43.11304710234668]
We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment.<n>In a large-scale identification task across 76 offspring models, our method achieves 100% accuracy in identifying the correct base model family.<n>We propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact.
arXiv Detail & Related papers (2026-02-10T05:57:35Z) - Incremental Fingerprinting in an Open World [4.632697550690284]
Network protocol fingerprinting is used to identify a protocol implementation by analyzing its input-output behavior.<n>Traditionally, fingerprinting operates under a closed-world assumption, where models of all implementations are assumed to be available.<n>We propose an incremental fingerprinting approach to solve the problem by combining active automata learning with closed-world fingerprinting.
arXiv Detail & Related papers (2026-01-29T13:14:15Z) - SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From [65.75182441010327]
We propose a stronger and more intrinsic notion of LLM fingerprinting: SeedPrints.<n>We show that untrained models exhibit reproducible token selection biases conditioned solely on their parameters.<n> Experiments on LLaMA-style and Qwen-style models show that SeedPrints achieves seed-level distinguishability and can provide birth-to-lifecycle identity verification akin to a biometric fingerprint.
arXiv Detail & Related papers (2025-09-30T15:34:08Z) - DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection [9.849635250118913]
Large language models (LLMs) are considered valuable Intellectual Properties (IP) for legitimate owners.<n>We propose DuFFin, a novel $textbfDu$al-Level $textbfFin$gerprinting $textbfF$ramework for black-box setting ownership verification.
arXiv Detail & Related papers (2025-05-22T11:16:46Z) - ImF: Implicit Fingerprint for Large Language Models [0.0]
We propose a novel injected fingerprint paradigm called Implicit Fingerprints (ImF)
ImF constructs fingerprint pairs with strong semantic correlations, disguising them as natural question-answer pairs within large language models (LLMs)
Our experiment on multiple LLMs demonstrates that ImF retains high verification success rates under adversarial conditions.
arXiv Detail & Related papers (2025-03-25T05:47:34Z) - Scalable Fingerprinting of Large Language Models [46.26999419117367]
We introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints.
We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model without degrading the model's utility.
arXiv Detail & Related papers (2025-02-11T18:43:07Z) - FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint [22.398234847594242]
Model fingerprinting is a widely adopted approach to safeguard the intellectual property rights of open-source models.<n>In this paper, we reveal that they are vulnerable to false claim attacks where adversaries falsely assert ownership of any third-party model.<n>Motivated by these findings, we propose a targeted fingerprinting paradigm (i.e., FIT-Print) to counteract false claim attacks.
arXiv Detail & Related papers (2025-01-26T13:00:58Z) - UTF:Undertrained Tokens as Fingerprints A Novel Approach to LLM Identification [23.164580168870682]
Fingerprinting large language models (LLMs) is essential for verifying model ownership, ensuring authenticity, and preventing misuse.
In this paper, we introduce a novel and efficient approach to fingerprinting LLMs by leveraging under-trained tokens.
Our method has minimal overhead and impact on model's performance, and does not require white-box access to target model's ownership identification.
arXiv Detail & Related papers (2024-10-16T07:36:57Z) - Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique [2.7174461714624805]
Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting.<n>We define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability.<n>We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity.
arXiv Detail & Related papers (2024-07-15T16:38:56Z) - Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning.
Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model.
It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z) - HuRef: HUman-REadable Fingerprint for Large Language Models [44.9820558213721]
HuRef is a human-readable fingerprint for large language models.
It uniquely identifies the base model without interfering with training or exposing model parameters to the public.
arXiv Detail & Related papers (2023-12-08T05:01:47Z) - Robust Retraining-free GAN Fingerprinting via Personalized Normalization [21.63902009635896]
The proposed method can embed different fingerprints inside the GAN by just changing the input of the ParamGen Nets.
The performance of the proposed method in terms of robustness against both model-level and image-level attacks is superior to the state-of-the-art.
arXiv Detail & Related papers (2023-11-09T16:09:12Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - PrintsGAN: Synthetic Fingerprint Generator [39.804969475699345]
PrintsGAN is a synthetic fingerprint generator capable of generating unique fingerprints along with multiple impressions for a given fingerprint.
We show the utility of the PrintsGAN generated by training a deep network to extract a fixed-length embedding from a fingerprint.
arXiv Detail & Related papers (2022-01-10T22:25:10Z) - SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
Language Recognition [94.30084702921529]
Hand gesture serves as a critical role in sign language.
Current deep-learning-based sign language recognition methods may suffer insufficient interpretability.
We introduce the first self-supervised pre-trainable SignBERT with incorporated hand prior for SLR.
arXiv Detail & Related papers (2021-10-11T16:18:09Z) - Responsible Disclosure of Generative Models Using Scalable
Fingerprinting [70.81987741132451]
Deep generative models have achieved a qualitatively new level of performance.
There are concerns on how this technology can be misused to spoof sensors, generate deep fakes, and enable misinformation at scale.
Our work enables a responsible disclosure of such state-of-the-art generative models, that allows researchers and companies to fingerprint their models.
arXiv Detail & Related papers (2020-12-16T03:51:54Z) - Artificial Fingerprinting for Generative Models: Rooting Deepfake
Attribution in Training Data [64.65952078807086]
Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversarial networks (GANs)
Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation.
We seek a proactive and sustainable solution on deepfake detection by introducing artificial fingerprints into the models.
arXiv Detail & Related papers (2020-07-16T16:49:55Z) - Latent Fingerprint Registration via Matching Densely Sampled Points [100.53031290339483]
Existing latent fingerprint registration approaches are mainly based on establishing correspondences between minutiae.
We propose a non-minutia latent fingerprint registration method which estimates the spatial transformation between a pair of fingerprints.
The proposed method achieves the state-of-the-art registration performance, especially under challenging conditions.
arXiv Detail & Related papers (2020-05-12T15:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.