Antidistillation Fingerprinting
- URL: http://arxiv.org/abs/2602.03812v1
- Date: Tue, 03 Feb 2026 18:15:50 GMT
- Title: Antidistillation Fingerprinting
- Authors: Yixuan Even Xu, John Kirchenbauer, Yash Savani, Asher Trockman, Alexander Robey, Tom Goldstein, Fei Fang, J. Zico Kolter,
- Abstract summary: We introduce antidistillation fingerprinting (ADFP), a principled approach that aligns the fingerprinting objective with the student's learning dynamics.<n>ADFP achieves a significant improvement over state-of-the-art baselines, stronger detection confidence with minimal impact on utility, even when the student model's architecture is unknown.
- Score: 119.66677613290359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model distillation enables efficient emulation of frontier large language models (LLMs), creating a need for robust mechanisms to detect when a third-party student model has trained on a teacher model's outputs. However, existing fingerprinting techniques that could be used to detect such distillation rely on heuristic perturbations that impose a steep trade-off between generation quality and fingerprinting strength, often requiring significant degradation of utility to ensure the fingerprint is effectively internalized by the student. We introduce antidistillation fingerprinting (ADFP), a principled approach that aligns the fingerprinting objective with the student's learning dynamics. Building upon the gradient-based framework of antidistillation sampling, ADFP utilizes a proxy model to identify and sample tokens that directly maximize the expected detectability of the fingerprint in the student after fine-tuning, rather than relying on the incidental absorption of the un-targeted biases of a more naive watermark. Experiments on GSM8K and OASST1 benchmarks demonstrate that ADFP achieves a significant Pareto improvement over state-of-the-art baselines, yielding stronger detection confidence with minimal impact on utility, even when the student model's architecture is unknown.
Related papers
- A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors [43.11304710234668]
We introduce a novel fingerprinting framework that leverages the behavioral patterns induced by safety alignment.<n>In a large-scale identification task across 76 offspring models, our method achieves 100% accuracy in identifying the correct base model family.<n>We propose a theoretical framework to transform this private fingerprint into a publicly verifiable, privacy-preserving artifact.
arXiv Detail & Related papers (2026-02-10T05:57:35Z) - FPEdit: Robust LLM Fingerprinting through Localized Parameter Editing [24.648168413166673]
FPEdit is a novel framework that leverages knowledge editing to inject semantically coherent natural language fingerprints.<n>We show that FPEdit achieves 95-100% fingerprint retention under both full- parameter fine-tuning and parameter-efficient adaptation.<n> FPEdit can embed 10 fingerprint pairs into LLaMA2-7B in under 2 minutes using less than 30 GB of GPU memory.
arXiv Detail & Related papers (2025-08-04T06:00:22Z) - MEraser: An Effective Fingerprint Erasure Approach for Large Language Models [27.416202744267594]
Large Language Models (LLMs) have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection.<n>We present Mismatched Eraser (MEraser), a novel method for effectively removing backdoor-based fingerprints from LLMs while maintaining model performance.
arXiv Detail & Related papers (2025-06-14T15:48:53Z) - ImF: Implicit Fingerprint for Large Language Models [14.580290415247385]
We introduce a novel adversarial attack named Generation Revision Intervention (GRI) attack.<n>GRI exploits the semantic fragility of current fingerprinting methods, effectively erasing fingerprints.<n>We propose a novel model fingerprint paradigm called Implicit Fingerprints (ImF)
arXiv Detail & Related papers (2025-03-25T05:47:34Z) - Latent fingerprint enhancement for accurate minutiae detection [8.996826918574463]
We propose a novel approach that uses generative adversary networks (GANs) to redefine Latent Fingerprint Enhancement (LFE)
By directly optimising the minutiae information during the generation process, the model produces enhanced latent fingerprints that exhibit exceptional fidelity to ground-truth instances.
Our framework integrates minutiae locations and orientation fields, ensuring the preservation of both local and structural fingerprint features.
arXiv Detail & Related papers (2024-09-18T08:35:31Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - On the Robustness of Quality Measures for GANs [136.18799984346248]
This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr'echet Inception Distance (FID)
We show that such metrics can also be manipulated by additive pixel perturbations.
arXiv Detail & Related papers (2022-01-31T06:43:09Z) - Responsible Disclosure of Generative Models Using Scalable
Fingerprinting [70.81987741132451]
Deep generative models have achieved a qualitatively new level of performance.
There are concerns on how this technology can be misused to spoof sensors, generate deep fakes, and enable misinformation at scale.
Our work enables a responsible disclosure of such state-of-the-art generative models, that allows researchers and companies to fingerprint their models.
arXiv Detail & Related papers (2020-12-16T03:51:54Z) - Artificial Fingerprinting for Generative Models: Rooting Deepfake
Attribution in Training Data [64.65952078807086]
Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversarial networks (GANs)
Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation.
We seek a proactive and sustainable solution on deepfake detection by introducing artificial fingerprints into the models.
arXiv Detail & Related papers (2020-07-16T16:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.