Related papers: HuRef: HUman-REadable Fingerprint for Large Language Models

HuRef: HUman-REadable Fingerprint for Large Language Models

URL: http://arxiv.org/abs/2312.04828v5
Date: Tue, 07 Jan 2025 04:38:25 GMT
Title: HuRef: HUman-REadable Fingerprint for Large Language Models
Authors: Boyi Zeng, Lizheng Wang, Yuncong Hu, Yi Xu, Chenghu Zhou, Xinbing Wang, Yu Yu, Zhouhan Lin,
Abstract summary: HuRef is a human-readable fingerprint for large language models.<n>It uniquely identifies the base model without interfering with training or exposing model parameters to the public.
Score: 44.9820558213721
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Protecting the copyright of large language models (LLMs) has become crucial due to their resource-intensive training and accompanying carefully designed licenses. However, identifying the original base model of an LLM is challenging due to potential parameter alterations. In this study, we introduce HuRef, a human-readable fingerprint for LLMs that uniquely identifies the base model without interfering with training or exposing model parameters to the public. We first observe that the vector direction of LLM parameters remains stable after the model has converged during pretraining, with negligible perturbations through subsequent training steps, including continued pretraining, supervised fine-tuning, and RLHF, which makes it a sufficient condition to identify the base model. The necessity is validated by continuing to train an LLM with an extra term to drive away the model parameters' direction and the model becomes damaged. However, this direction is vulnerable to simple attacks like dimension permutation or matrix rotation, which significantly change it without affecting performance. To address this, leveraging the Transformer structure, we systematically analyze potential attacks and define three invariant terms that identify an LLM's base model. Due to the potential risk of information leakage, we cannot publish invariant terms directly. Instead, we map them to a Gaussian vector using an encoder, then convert it into a natural image using StyleGAN2, and finally publish the image. In our black-box setting, all fingerprinting steps are internally conducted by the LLMs owners. To ensure the published fingerprints are honestly generated, we introduced Zero-Knowledge Proof (ZKP). Experimental results across various LLMs demonstrate the effectiveness of our method. The code is available at https://github.com/LUMIA-Group/HuRef.

Related papers

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Model Inversion in Split Learning for Personalized LLMs: New Insights from Information Bottleneck Theory [11.83473842859642]
This work is the first to identify model inversion attacks in the split learning framework for personalized LLMs. We propose a two-stage attack system in which the first part projects representations into the embedding space, and the second part uses a generative model to recover text from these embeddings.
arXiv Detail & Related papers (2025-01-10T13:47:13Z)
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
Large Language Models (LLMs) have demonstrated great potential as generalist assistants. It is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. In this paper, we observe that directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs.
arXiv Detail & Related papers (2024-07-11T17:52:03Z)
A Fingerprint for Large Language Models [10.63985246068255]
We propose a novel black-box fingerprinting technique for large language models (LLMs) Experimental results indicate that the proposed technique achieves superior performance in ownership verification and robustness against PEFT attacks.
arXiv Detail & Related papers (2024-07-01T12:25:42Z)
ProFLingo: A Fingerprinting-based Intellectual Property Protection Scheme for Large Language Models [18.46904928949022]
We propose ProFLingo, a black-box fingerprinting-based IP protection scheme for large language models (LLMs) ProFLingo generates queries that elicit specific responses from an original model, thereby establishing unique fingerprints. Our scheme assesses the effectiveness of these queries on a suspect model to determine whether it has been derived from the original model.
arXiv Detail & Related papers (2024-05-03T20:00:40Z)
Temporal Scaling Law for Large Language Models [57.83580734589091]
We propose the novel concept of Temporal Scaling Law, studying how the test loss of an LLM evolves as the training steps scale up.<n>In contrast to modeling the test loss as a whole in a coarse-grained manner, we break it down and dive into the fine-grained test loss of each token position.<n>We derive the much more precise temporal scaling law by studying the temporal patterns of the parameters in the dynamic hyperbolic-law.
arXiv Detail & Related papers (2024-04-27T05:49:11Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
Instructional Fingerprinting of Large Language Models [57.72356846657551]
We present a pilot study on fingerprinting Large language models (LLMs) as a form of very lightweight instruction tuning. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
arXiv Detail & Related papers (2024-01-21T09:51:45Z)
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN) At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models [30.246821533532017]
Large Language Models (LLMs) with billions of parameters are prime targets for network pruning, removing some model weights without hurting performance. We present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner)
arXiv Detail & Related papers (2023-11-08T18:59:54Z)
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model [22.870512676002463]
This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
arXiv Detail & Related papers (2023-10-24T03:08:58Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.