Attacks and Defenses Against LLM Fingerprinting
- URL: http://arxiv.org/abs/2508.09021v1
- Date: Tue, 12 Aug 2025 15:36:36 GMT
- Title: Attacks and Defenses Against LLM Fingerprinting
- Authors: Kevin Kurian, Ethan Holland, Sean Oesch,
- Abstract summary: We present a study of LLM fingerprinting from both offensive and defensive perspectives.<n>Our attack methodology uses reinforcement learning to automatically optimize query selection.<n>Our defensive approach employs semantic-preserving output filtering through a secondary LLM to obfuscate model identity.
- Score: 2.5824043688763547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models are increasingly deployed in sensitive environments, fingerprinting attacks pose significant privacy and security risks. We present a study of LLM fingerprinting from both offensive and defensive perspectives. Our attack methodology uses reinforcement learning to automatically optimize query selection, achieving better fingerprinting accuracy with only 3 queries compared to randomly selecting 3 queries from the same pool. Our defensive approach employs semantic-preserving output filtering through a secondary LLM to obfuscate model identity while maintaining semantic integrity. The defensive method reduces fingerprinting accuracy across tested models while preserving output quality. These contributions show the potential to improve fingerprinting tools capabilities while providing practical mitigation strategies against fingerprinting attacks.
Related papers
- Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models [14.909356150499297]
We propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA)<n>The proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance.
arXiv Detail & Related papers (2026-01-07T06:06:56Z) - iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification [22.052342142871144]
iSeal is a fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner.<n>It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy.<n>iSeal achieves 100 percent Fingerprint Success Rate on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.
arXiv Detail & Related papers (2025-11-12T02:30:19Z) - SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z) - SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From [65.75182441010327]
We propose a stronger and more intrinsic notion of LLM fingerprinting: SeedPrints.<n>We show that untrained models exhibit reproducible token selection biases conditioned solely on their parameters.<n> Experiments on LLaMA-style and Qwen-style models show that SeedPrints achieves seed-level distinguishability and can provide birth-to-lifecycle identity verification akin to a biometric fingerprint.
arXiv Detail & Related papers (2025-09-30T15:34:08Z) - From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models [28.393476667026523]
We propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights.<n>RFEdit is protected by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning.
arXiv Detail & Related papers (2025-09-03T08:22:04Z) - MEraser: An Effective Fingerprint Erasure Approach for Large Language Models [19.8112399985437]
Large Language Models (LLMs) have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection.<n>We present Mismatched Eraser (MEraser), a novel method for effectively removing backdoor-based fingerprints from LLMs while maintaining model performance.
arXiv Detail & Related papers (2025-06-14T15:48:53Z) - ImF: Implicit Fingerprint for Large Language Models [0.0]
We introduce a novel adversarial attack named Generation Revision Intervention (GRI) attack.<n>GRI exploits the semantic fragility of current fingerprinting methods, effectively erasing fingerprints.<n>We propose a novel model fingerprint paradigm called Implicit Fingerprints (ImF)
arXiv Detail & Related papers (2025-03-25T05:47:34Z) - Adversarial Example Based Fingerprinting for Robust Copyright Protection in Split Learning [17.08424946015621]
We propose the first copyright protection scheme for Split Learning model, leveraging fingerprint to ensure effective and robust copyright protection.<n>This is demonstrated by a remarkable fingerprint verification success rate (FVSR) of 100% on MNIST, 98% on CIFAR-10, and 100% on ImageNet.
arXiv Detail & Related papers (2025-03-05T06:07:16Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique [2.7174461714624805]
Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting.<n>We define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability.<n>We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity.
arXiv Detail & Related papers (2024-07-15T16:38:56Z) - Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning.
Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model.
It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.