Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs
- URL: http://arxiv.org/abs/2505.19466v1
- Date: Mon, 26 May 2025 03:38:14 GMT
- Title: Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs
- Authors: Hongyu Liang, Yuting Zheng, Yihan Li, Yiran Zhang, Shiyu Liang,
- Abstract summary: We propose a novel detection method that rigorously determines whether a model has been fine-tuned from a specified base model.<n>This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning.<n>We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios.
- Score: 3.925661213372832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations. To address this limitation, we propose a novel detection method Origin-Tracer that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.
Related papers
- GLAD: Generalizable Tuning for Vision-Language Models [41.071911050087586]
We propose a simpler and more general framework called GLAD (Generalizable LoRA tuning with RegulArized GraDient)<n>We show that merely applying LoRA achieves performance in downstream tasks comparable to current state-of-the-art prompt-based methods.
arXiv Detail & Related papers (2025-07-17T12:58:15Z) - Learning Counterfactually Decoupled Attention for Open-World Model Attribution [75.52873383916672]
We propose a Counterfactually Decoupled Attention Learning (CDAL) method for open-world model attribution.<n>Our method consistently improves state-of-the-art models by large margins, particularly for unseen novel attacks.
arXiv Detail & Related papers (2025-06-29T03:25:45Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense [5.150608040339816]
We introduce PADL, a new solution able to generate image-specific perturbations using a symmetric scheme of encoding and decoding based on cross-attention.
Our method generalizes to a range of unseen models with diverse architectural designs, such as StarGANv2, BlendGAN, DiffAE, StableDiffusion and StableDiffusionXL.
arXiv Detail & Related papers (2024-09-26T15:16:32Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model.
In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z) - A Generative Framework for Low-Cost Result Validation of Machine Learning-as-a-Service Inference [4.478182379059458]
Fides is a novel framework for real-time integrity validation of ML-as-a-Service (ML) inference.
Fides features a client-side attack detection model that uses statistical analysis and divergence measurements to identify, with a high likelihood, if the service model is under attack.
We devised a generative adversarial network framework for training the attack detection and re-classification models.
arXiv Detail & Related papers (2023-03-31T19:17:30Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Towards Robust and Reliable Algorithmic Recourse [11.887537452826624]
We propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts.
We also carry out detailed theoretical analysis which underscores the importance of constructing recourses that are robust to model shifts.
arXiv Detail & Related papers (2021-02-26T17:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.