MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
- URL: http://arxiv.org/abs/2410.08604v3
- Date: Thu, 20 Feb 2025 08:04:57 GMT
- Title: MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models
- Authors: Shojiro Yamabe, Futa Waseda, Koki Wataoka, Tsubasa Takahashi,
- Abstract summary: We propose a novel fingerprinting method, MergePrint, to embed robust fingerprints capable of surviving model merging.
MergePrint enables black-box ownership verification, where owners only need to check if a model produces target outputs for specific fingerprint inputs.
- Score: 1.9249287163937978
- License:
- Abstract: Protecting the intellectual property of Large Language Models (LLMs) has become increasingly critical due to the high cost of training. Model merging, which integrates multiple expert models into a single multi-task model, introduces a novel risk of unauthorized use of LLMs due to its efficient merging process. While fingerprinting techniques have been proposed for verifying model ownership, their resistance to model merging remains unexplored. To address this gap, we propose a novel fingerprinting method, MergePrint, which embeds robust fingerprints capable of surviving model merging. MergePrint enables black-box ownership verification, where owners only need to check if a model produces target outputs for specific fingerprint inputs, without accessing model weights or intermediate outputs. By optimizing against a pseudo-merged model that simulates merged behavior, MergePrint ensures fingerprints that remain detectable after merging. Additionally, to minimize performance degradation, we pre-optimize the fingerprint inputs. MergePrint pioneers a practical solution for black-box ownership verification, protecting LLMs from misappropriation via merging, while also excelling in resistance to broader model theft threats.
Related papers
- Scalable Fingerprinting of Large Language Models [46.26999419117367]
We introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints.
We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model without degrading the model's utility.
arXiv Detail & Related papers (2025-02-11T18:43:07Z) - FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint [29.015707553430442]
Model fingerprinting is a widely adopted approach to safeguard the intellectual property rights of open-source models.
In this paper, we reveal that they are vulnerable to false claim attacks where adversaries falsely assert ownership of any third-party model.
Motivated by these findings, we propose a targeted fingerprinting paradigm (i.e., FIT-Print) to counteract false claim attacks.
arXiv Detail & Related papers (2025-01-26T13:00:58Z) - REEF: Representation Encoding Fingerprints for Large Language Models [53.679712605506715]
REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model.
This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations.
arXiv Detail & Related papers (2024-10-18T08:27:02Z) - DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing.
We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection.
DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.
adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.
Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging [25.327483618051378]
We conduct the first study on the robustness of IP protection methods under model merging scenarios.
Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models.
Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques.
arXiv Detail & Related papers (2024-04-08T04:30:33Z) - Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning.
Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model.
It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z) - Robust Retraining-free GAN Fingerprinting via Personalized Normalization [21.63902009635896]
The proposed method can embed different fingerprints inside the GAN by just changing the input of the ParamGen Nets.
The performance of the proposed method in terms of robustness against both model-level and image-level attacks is superior to the state-of-the-art.
arXiv Detail & Related papers (2023-11-09T16:09:12Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - DeepHider: A Multi-module and Invisibility Watermarking Scheme for
Language Model [0.0]
This paper proposes a new threat of replacing the model classification module and performing global fine-tuning of the model.
We use the properties of blockchain such as tamper-proof and traceability to prevent the ownership statement of thieves.
Experiments show that the proposed scheme successfully verifies ownership with 100% watermark verification accuracy.
arXiv Detail & Related papers (2022-08-09T11:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.