Related papers: DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection

URL: http://arxiv.org/abs/2601.08223v3
Date: Wed, 21 Jan 2026 05:03:08 GMT
Title: DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection
Authors: Zhenhua Xu, Yiran Zhao, Mengting Zhong, Dezhang Kong, Changting Lin, Tong Qiao, Meng Han,
Abstract summary: We propose a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers.<n>Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility.
Score: 21.422855789542695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid growth of large language models raises pressing concerns about intellectual property protection under black-box deployment. Existing backdoor-based fingerprints either rely on rare tokens -- leading to high-perplexity inputs susceptible to filtering -- or use fixed trigger-response mappings that are brittle to leakage and post-hoc adaptation. We propose \textsc{Dual-Layer Nested Fingerprinting} (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. Across Mistral-7B, LLaMA-3-8B-Instruct, and Falcon3-7B-Instruct, DNF achieves perfect fingerprint activation while preserving downstream utility. Compared with existing methods, it uses lower-perplexity triggers, remains undetectable under fingerprint detection attacks, and is relatively robust to incremental fine-tuning and model merging. These results position DNF as a practical, stealthy, and resilient solution for LLM ownership verification and intellectual property protection.

Related papers

Antidistillation Fingerprinting [119.66677613290359]
We introduce antidistillation fingerprinting (ADFP), a principled approach that aligns the fingerprinting objective with the student's learning dynamics.<n>ADFP achieves a significant improvement over state-of-the-art baselines, stronger detection confidence with minimal impact on utility, even when the student model's architecture is unknown.
arXiv Detail & Related papers (2026-02-03T18:15:50Z)
Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models [14.909356150499297]
We propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA)<n>The proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance.
arXiv Detail & Related papers (2026-01-07T06:06:56Z)
SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting [4.335948336782789]
We propose a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims.<n> SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation.
arXiv Detail & Related papers (2025-12-03T09:53:47Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z)
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models [28.393476667026523]
We propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights.<n>RFEdit is protected by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning.
arXiv Detail & Related papers (2025-09-03T08:22:04Z)
SoK: Large Language Model Copyright Auditing via Fingerprinting [69.14570598973195]
We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches.<n>We propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios.
arXiv Detail & Related papers (2025-08-27T12:56:57Z)
FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing [24.648168413166673]
FPEdit is a novel framework that leverages knowledge editing to inject semantically coherent natural language fingerprints.<n>We show that FPEdit achieves 95-100% fingerprint retention under both full- parameter fine-tuning and parameter-efficient adaptation.<n> FPEdit can embed 10 fingerprint pairs into LLaMA2-7B in under 2 minutes using less than 30 GB of GPU memory.
arXiv Detail & Related papers (2025-08-04T06:00:22Z)
Robust Anti-Backdoor Instruction Tuning in LVLMs [53.766434746801366]
We introduce a lightweight, certified-agnostic defense framework for large visual language models (LVLMs)<n>Our framework finetunes only adapter modules and text embedding layers under instruction tuning.<n>Experiments against seven attacks on Flickr30k and MSCOCO demonstrate that ours reduces their attack success rate to nearly zero.
arXiv Detail & Related papers (2025-06-04T01:23:35Z)
ImF: Implicit Fingerprint for Large Language Models [14.580290415247385]
We introduce a novel adversarial attack named Generation Revision Intervention (GRI) attack.<n>GRI exploits the semantic fragility of current fingerprinting methods, effectively erasing fingerprints.<n>We propose a novel model fingerprint paradigm called Implicit Fingerprints (ImF)
arXiv Detail & Related papers (2025-03-25T05:47:34Z)
Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.