Related papers: A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

URL: http://arxiv.org/abs/2602.09434v1
Date: Tue, 10 Feb 2026 05:57:35 GMT
Title: A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors
Authors: Zhenyu Xu, Victor S. Sheng,
Abstract summary: We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment.<n>In a large-scale identification task across 76 offspring models, our method achieves 100% accuracy in identifying the correct base model family.<n>We propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact.
Score: 43.11304710234668
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Protecting the intellectual property of large language models (LLMs) is a critical challenge due to the proliferation of unauthorized derivative models. We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment, applying the concept of refusal vectors for LLM provenance tracking. These vectors, extracted from directional patterns in a model's internal representations when processing harmful versus harmless prompts, serve as robust behavioral fingerprints. Our contribution lies in developing a fingerprinting system around this concept and conducting extensive validation of its effectiveness for IP protection. We demonstrate that these behavioral fingerprints are highly robust against common modifications, including finetunes, merges, and quantization. Our experiments show that the fingerprint is unique to each model family, with low cosine similarity between independently trained models. In a large-scale identification task across 76 offspring models, our method achieves 100\% accuracy in identifying the correct base model family. Furthermore, we analyze the fingerprint's behavior under alignment-breaking attacks, finding that while performance degrades significantly, detectable traces remain. Finally, we propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact using locality-sensitive hashing and zero-knowledge proofs.

Related papers

Antidistillation Fingerprinting [119.66677613290359]
We introduce antidistillation fingerprinting (ADFP), a principled approach that aligns the fingerprinting objective with the student's learning dynamics.<n>ADFP achieves a significant improvement over state-of-the-art baselines, stronger detection confidence with minimal impact on utility, even when the student model's architecture is unknown.
arXiv Detail & Related papers (2026-02-03T18:15:50Z)
Are Robust LLM Fingerprints Adversarially Robust? [31.998822577243867]
We first define a concrete, practical threat model against model fingerprinting.<n>We then take a critical look at existing model fingerprinting schemes to identify their fundamental vulnerabilities.<n>Based on these, we develop adaptive adversarial attacks tailored for each vulnerability.
arXiv Detail & Related papers (2025-09-30T17:47:09Z)
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From [65.75182441010327]
We propose a stronger and more intrinsic notion of LLM fingerprinting: SeedPrints.<n>We show that untrained models exhibit reproducible token selection biases conditioned solely on their parameters.<n> Experiments on LLaMA-style and Qwen-style models show that SeedPrints achieves seed-level distinguishability and can provide birth-to-lifecycle identity verification akin to a biometric fingerprint.
arXiv Detail & Related papers (2025-09-30T15:34:08Z)
Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z)
ImF: Implicit Fingerprint for Large Language Models [14.580290415247385]
We introduce a novel adversarial attack named Generation Revision Intervention (GRI) attack.<n>GRI exploits the semantic fragility of current fingerprinting methods, effectively erasing fingerprints.<n>We propose a novel model fingerprint paradigm called Implicit Fingerprints (ImF)
arXiv Detail & Related papers (2025-03-25T05:47:34Z)
Scalable Fingerprinting of Large Language Models [42.65365809809273]
We introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints.<n>We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model without degrading the model's utility.
arXiv Detail & Related papers (2025-02-11T18:43:07Z)
FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint [22.398234847594242]
Model fingerprinting is a widely adopted approach to safeguard the intellectual property rights of open-source models.<n>In this paper, we reveal that they are vulnerable to false claim attacks where adversaries falsely assert ownership of any third-party model.<n>Motivated by these findings, we propose a targeted fingerprinting paradigm (i.e., FIT-Print) to counteract false claim attacks.
arXiv Detail & Related papers (2025-01-26T13:00:58Z)
Fingerprint Vector: Enabling Scalable and Efficient Model Fingerprint Transfer via Vector Addition [23.282821424581]
We propose a novel mechanism called the Fingerprint Vector.<n>It embeds a fingerprint into the base model via backdoor-based fine-tuning, then extracts a task-specific parameter delta as a fingerprint vector.<n>It achieves comparable or superior performance to direct injection across key desiderata.
arXiv Detail & Related papers (2024-09-13T14:04:39Z)
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique [2.7174461714624805]
Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting.<n>We define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability.<n>We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity.
arXiv Detail & Related papers (2024-07-15T16:38:56Z)
Responsible Disclosure of Generative Models Using Scalable Fingerprinting [70.81987741132451]
Deep generative models have achieved a qualitatively new level of performance. There are concerns on how this technology can be misused to spoof sensors, generate deep fakes, and enable misinformation at scale. Our work enables a responsible disclosure of such state-of-the-art generative models, that allows researchers and companies to fingerprint their models.
arXiv Detail & Related papers (2020-12-16T03:51:54Z)
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data [64.65952078807086]
Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversarial networks (GANs) Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation. We seek a proactive and sustainable solution on deepfake detection by introducing artificial fingerprints into the models.
arXiv Detail & Related papers (2020-07-16T16:49:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.