Neural network fragile watermarking with no model performance
degradation
- URL: http://arxiv.org/abs/2208.07585v1
- Date: Tue, 16 Aug 2022 07:55:20 GMT
- Title: Neural network fragile watermarking with no model performance
degradation
- Authors: Zhaoxia Yin, Heng Yin, and Xinpeng Zhang
- Abstract summary: We propose a novel neural network fragile watermarking with no model performance degradation.
Experiments show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.
- Score: 28.68910526223425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are vulnerable to malicious fine-tuning attacks such as
data poisoning and backdoor attacks. Therefore, in recent research, it is
proposed how to detect malicious fine-tuning of neural network models. However,
it usually negatively affects the performance of the protected model. Thus, we
propose a novel neural network fragile watermarking with no model performance
degradation. In the process of watermarking, we train a generative model with
the specific loss function and secret key to generate triggers that are
sensitive to the fine-tuning of the target classifier. In the process of
verifying, we adopt the watermarked classifier to get labels of each fragile
trigger. Then, malicious fine-tuning can be detected by comparing secret keys
and labels. Experiments on classic datasets and classifiers show that the
proposed method can effectively detect model malicious fine-tuning with no
model performance degradation.
Related papers
- Augmented Neural Fine-Tuning for Efficient Backdoor Purification [16.74156528484354]
Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks.
We propose Neural mask Fine-Tuning (NFT) with an aim to optimally re-organize the neuron activities.
NFT relaxes the trigger synthesis process and eliminates the requirement of the adversarial search module.
arXiv Detail & Related papers (2024-07-14T02:36:54Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing [34.86809796164664]
Fragile model watermarks aim to prevent unexpected tampering that could lead models to make incorrect decisions.
Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits.
This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.
arXiv Detail & Related papers (2024-04-11T09:01:52Z) - Disarming Steganography Attacks Inside Neural Network Models [4.750077838548593]
We propose a zero-trust prevention strategy based on AI model attack disarm and reconstruction.
We demonstrate a 100% prevention rate while the methods introduce a minimal decrease in model accuracy based on Qint8 and K-LRBP methods.
arXiv Detail & Related papers (2023-09-06T15:18:35Z) - VPN: Verification of Poisoning in Neural Networks [11.221552724154988]
We study another neural network security issue, namely data poisoning.
In this case an attacker inserts a trigger into a subset of the training data, in such a way that at test time, this trigger in an input causes the trained model to misclassify to some target class.
We show how to formulate the check for data poisoning as a property that can be checked with off-the-shelf verification tools.
arXiv Detail & Related papers (2022-05-08T15:16:05Z) - Reversible Watermarking in Deep Convolutional Neural Networks for
Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication.
The influence of embedding reversible watermarking on the classification performance is less than 0.5%.
At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z) - TOP: Backdoor Detection in Neural Networks via Transferability of
Perturbation [1.52292571922932]
Detection of backdoors in trained models without access to the training data or example triggers is an important open problem.
In this paper, we identify an interesting property of these models: adversarial perturbations transfer from image to image more readily in poisoned models than in clean models.
We use this feature to detect poisoned models in the TrojAI benchmark, as well as additional models.
arXiv Detail & Related papers (2021-03-18T14:13:30Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - Model Watermarking for Image Processing Networks [120.918532981871]
How to protect the intellectual property of deep models is a very important but seriously under-researched problem.
We propose the first model watermarking framework for protecting image processing models.
arXiv Detail & Related papers (2020-02-25T18:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.