Related papers: MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models

Related papers

Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! [1.8824463630667776]
Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent.<n>This work introduces a simple yet effective approach for robust fingerprinting based on intrinsic model characteristics.
arXiv Detail & Related papers (2025-07-02T12:29:38Z)
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models [19.8112399985437]
Large Language Models (LLMs) have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection.<n>We present Mismatched Eraser (MEraser), a novel method for effectively removing backdoor-based fingerprints from LLMs while maintaining model performance.
arXiv Detail & Related papers (2025-06-14T15:48:53Z)
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection [26.066755429896926]
Methods select Out-of-Distribution (OoD) data as backdoor watermarks and retrain the original model for copyright protection. Existing methods are susceptible to malicious detection and forgery by adversaries, resulting in watermark evasion. We propose Model-underlineagnostic Black-box Backdoor Wunderlineatermarking Framework (AGATE) to address stealthiness and robustness challenges in multimodal model copyright protection.
arXiv Detail & Related papers (2025-04-28T14:52:01Z)
ImF: Implicit Fingerprint for Large Language Models [0.0]
We propose a novel injected fingerprint paradigm called Implicit Fingerprints (ImF) ImF constructs fingerprint pairs with strong semantic correlations, disguising them as natural question-answer pairs within large language models (LLMs) Our experiment on multiple LLMs demonstrates that ImF retains high verification success rates under adversarial conditions.
arXiv Detail & Related papers (2025-03-25T05:47:34Z)
Adversarial Example Based Fingerprinting for Robust Copyright Protection in Split Learning [17.08424946015621]
We propose the first copyright protection scheme for Split Learning model, leveraging fingerprint to ensure effective and robust copyright protection. This is demonstrated by a remarkable fingerprint verification success rate (FVSR) of 100% on MNIST, 98% on CIFAR-10, and 100% on ImageNet.
arXiv Detail & Related papers (2025-03-05T06:07:16Z)
Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging [49.270050440553575]
We propose textttMerger-as-a-Stealer, a two-stage framework to achieve this attack. First, the attacker fine-tunes a malicious model to force it to respond to any PII-related queries. Second, the attacker inputs direct PII-related queries to the merged model to extract targeted PII.
arXiv Detail & Related papers (2025-02-22T05:34:53Z)
Scalable Fingerprinting of Large Language Models [46.26999419117367]
We introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model without degrading the model's utility.
arXiv Detail & Related papers (2025-02-11T18:43:07Z)
FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint [29.015707553430442]
Model fingerprinting is a widely adopted approach to safeguard the intellectual property rights of open-source models. In this paper, we reveal that they are vulnerable to false claim attacks where adversaries falsely assert ownership of any third-party model. Motivated by these findings, we propose a targeted fingerprinting paradigm (i.e., FIT-Print) to counteract false claim attacks.
arXiv Detail & Related papers (2025-01-26T13:00:58Z)
REEF: Representation Encoding Fingerprints for Large Language Models [53.679712605506715]
REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model. This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations.
arXiv Detail & Related papers (2024-10-18T08:27:02Z)
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks. adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z)
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging [25.327483618051378]
We conduct the first study on the robustness of IP protection methods under model merging scenarios. Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models. Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques.
arXiv Detail & Related papers (2024-04-08T04:30:33Z)
Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Model Synthesis for Zero-Shot Model Attribution [26.835046772924258]
generative models are shaping various fields such as art, design, and human-computer interaction. We propose a model synthesis technique, which generates numerous synthetic models mimicking the fingerprint patterns of real-world generative models. Our experiments demonstrate that this fingerprint extractor, trained solely on synthetic models, achieves impressive zero-shot generalization on a wide range of real-world generative models.
arXiv Detail & Related papers (2023-07-29T13:00:42Z)
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models [32.29120988096214]
This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user.
arXiv Detail & Related papers (2023-06-07T19:44:14Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
Attributing Image Generative Models using Latent Fingerprints [33.037718660732544]
Generative models have enabled the creation of contents that are indistinguishable from those taken from nature. One potential risk mitigation strategy is to attribute generative models via fingerprinting. This paper investigates the use of latent semantic dimensions as fingerprints.
arXiv Detail & Related papers (2023-04-17T00:13:10Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint. We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC) SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z)
Learning Robust Representations Of Generative Models Using Set-Based Artificial Fingerprints [14.191129493685212]
Existing methods approximate the distance between the models via their sample distributions. We consider unique traces (a.k.a. "artificial fingerprints") as representations of generative models. We propose a new learning method based on set-encoding and contrastive training.
arXiv Detail & Related papers (2022-06-04T23:20:07Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.