MergePrint: Robust Fingerprinting against Merging Large Language Models
- URL: http://arxiv.org/abs/2410.08604v1
- Date: Fri, 11 Oct 2024 08:00:49 GMT
- Title: MergePrint: Robust Fingerprinting against Merging Large Language Models
- Authors: Shojiro Yamabe, Tsubasa Takahashi, Futa Waseda, Koki Wataoka,
- Abstract summary: We propose a novel fingerprinting method MergePrint that embeds robust fingerprints designed to preserve ownership claims even after model merging.
By optimizing against a pseudo-merged model, MergePrint generates fingerprints that remain detectable after merging.
This approach provides a practical fingerprinting strategy for asserting ownership in cases of misappropriation through model merging.
- Score: 1.9249287163937978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the cost of training large language models (LLMs) rises, protecting their intellectual property has become increasingly critical. Model merging, which integrates multiple expert models into a single model capable of performing multiple tasks, presents a growing risk of unauthorized and malicious usage. While fingerprinting techniques have been studied for asserting model ownership, existing methods have primarily focused on fine-tuning, leaving model merging underexplored. To address this gap, we propose a novel fingerprinting method MergePrint that embeds robust fingerprints designed to preserve ownership claims even after model merging. By optimizing against a pseudo-merged model, which simulates post-merged model weights, MergePrint generates fingerprints that remain detectable after merging. Additionally, we optimize the fingerprint inputs to minimize performance degradation, enabling verification through specific outputs from targeted inputs. This approach provides a practical fingerprinting strategy for asserting ownership in cases of misappropriation through model merging.
Related papers
- ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.
adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.
Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging [25.327483618051378]
We conduct the first study on the robustness of IP protection methods under model merging scenarios.
Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models.
Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques.
arXiv Detail & Related papers (2024-04-08T04:30:33Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Model Synthesis for Zero-Shot Model Attribution [26.835046772924258]
generative models are shaping various fields such as art, design, and human-computer interaction.
We propose a model synthesis technique, which generates numerous synthetic models mimicking the fingerprint patterns of real-world generative models.
Our experiments demonstrate that this fingerprint extractor, trained solely on synthetic models, achieves impressive zero-shot generalization on a wide range of real-world generative models.
arXiv Detail & Related papers (2023-07-29T13:00:42Z) - WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models [32.29120988096214]
This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images.
Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user.
arXiv Detail & Related papers (2023-06-07T19:44:14Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Attributing Image Generative Models using Latent Fingerprints [33.037718660732544]
Generative models have enabled the creation of contents that are indistinguishable from those taken from nature.
One potential risk mitigation strategy is to attribute generative models via fingerprinting.
This paper investigates the use of latent semantic dimensions as fingerprints.
arXiv Detail & Related papers (2023-04-17T00:13:10Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - Learning Robust Representations Of Generative Models Using Set-Based
Artificial Fingerprints [14.191129493685212]
Existing methods approximate the distance between the models via their sample distributions.
We consider unique traces (a.k.a. "artificial fingerprints") as representations of generative models.
We propose a new learning method based on set-encoding and contrastive training.
arXiv Detail & Related papers (2022-06-04T23:20:07Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.