AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
- URL: http://arxiv.org/abs/2510.06738v1
- Date: Wed, 08 Oct 2025 07:51:11 GMT
- Title: AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
- Authors: Boyi Zeng, Lin Chen, Ziwei He, Xinbing Wang, Zhouhan Lin,
- Abstract summary: We propose a training-free fingerprinting method based on weight matrices.<n>We leverage the Linear Assignment Problem (LAP) and an unbiased Centered Kernel Alignment (CKA) similarity to neutralize the effects of parameter manipulations.<n>Our method demonstrates exceptional robustness against all six aforementioned post-training categories while exhibiting a near-zero risk of false positives.
- Score: 44.93519442566325
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Protecting the intellectual property of large language models (LLMs) is crucial, given the substantial resources required for their training. Consequently, there is an urgent need for both model owners and third parties to determine whether a suspect LLM is trained from scratch or derived from an existing base model. However, the intensive post-training processes that models typically undergo-such as supervised fine-tuning, extensive continued pretraining, reinforcement learning, multi-modal extension, pruning, and upcycling-pose significant challenges to reliable identification. In this work, we propose a training-free fingerprinting method based on weight matrices. We leverage the Linear Assignment Problem (LAP) and an unbiased Centered Kernel Alignment (CKA) similarity to neutralize the effects of parameter manipulations, yielding a highly robust and high-fidelity similarity metric. On a comprehensive testbed of 60 positive and 90 negative model pairs, our method demonstrates exceptional robustness against all six aforementioned post-training categories while exhibiting a near-zero risk of false positives. By achieving perfect scores on all classification metrics, our approach establishes a strong basis for reliable model lineage verification. Moreover, the entire computation completes within 30s on an NVIDIA 3090 GPU. The code is available at https://github.com/LUMIA-Group/AWM.
Related papers
- One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking [7.856998585396422]
Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines.<n>We propose textbfmulti-task learning (MTL) as a more efficient alternative that fine-tunes a single model to perform claim detection, evidence ranking, and stance detection jointly.
arXiv Detail & Related papers (2026-01-16T13:44:25Z) - Depth-Wise Activation Steering for Honest Language Models [4.299414551764217]
We present a training-free activation steering method that weights steering strength across network depth using a Gaussian schedule.<n>We evaluate seven models spanning the LLaMA, Qwen, and Mistral families and find that Gaussian scheduling improves honesty over no-steering and single-layer baselines.
arXiv Detail & Related papers (2025-12-08T16:03:06Z) - SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - One Token to Fool LLM-as-a-Judge [52.45386385722788]
Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluation and providing reward signals for training other models.<n>We uncover a critical vulnerability even in this reference-based paradigm: generative reward models are systematically susceptible to reward hacking.
arXiv Detail & Related papers (2025-07-11T17:55:22Z) - Self-Training Elicits Concise Reasoning in Large Language Models [23.475414693530965]
Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens.<n>We propose simple fine-tuning methods which leverage self-generated concise reasoning paths.<n>Our method achieves a 30% reduction in output tokens, across five model families on GSM8K and MATH, while maintaining average accuracy.
arXiv Detail & Related papers (2025-02-27T14:14:50Z) - S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.<n>Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - Towards Adversarially Robust Deep Metric Learning [0.8702432681310401]
Deep neural networks are prone to adversarial attacks and could be easily fooled by adversarial examples.<n>Existing works fail to thoroughly inspect the robustness of DML models.<n>We propose a new defense, the Ensemble Adversarial Training (EAT), which exploits ensemble learning and adversarial training.
arXiv Detail & Related papers (2025-01-02T03:15:25Z) - REEF: Representation Encoding Fingerprints for Large Language Models [53.679712605506715]
REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model.
This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations.
arXiv Detail & Related papers (2024-10-18T08:27:02Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.