Elevating Defenses: Bridging Adversarial Training and Watermarking for
Model Resilience
- URL: http://arxiv.org/abs/2312.14260v2
- Date: Sun, 7 Jan 2024 21:52:09 GMT
- Title: Elevating Defenses: Bridging Adversarial Training and Watermarking for
Model Resilience
- Authors: Janvi Thakkar, Giulio Zizzo, Sergio Maffeis
- Abstract summary: This work introduces a novel framework to integrate adversarial training with watermarking techniques to fortify against evasion attacks.
We use the MNIST and Fashion-MNIST datasets to evaluate our proposed technique on various model stealing attacks.
- Score: 2.8084422332394428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models are being used in an increasing number of critical
applications; thus, securing their integrity and ownership is critical. Recent
studies observed that adversarial training and watermarking have a conflicting
interaction. This work introduces a novel framework to integrate adversarial
training with watermarking techniques to fortify against evasion attacks and
provide confident model verification in case of intellectual property theft. We
use adversarial training together with adversarial watermarks to train a robust
watermarked model. The key intuition is to use a higher perturbation budget to
generate adversarial watermarks compared to the budget used for adversarial
training, thus avoiding conflict. We use the MNIST and Fashion-MNIST datasets
to evaluate our proposed technique on various model stealing attacks. The
results obtained consistently outperform the existing baseline in terms of
robustness performance and further prove the resilience of this defense against
pruning and fine-tuning removal attacks.
Related papers
- Adversarial Machine Learning: Attacking and Safeguarding Image Datasets [0.0]
This paper examines the vulnerabilities of convolutional neural networks (CNNs) to adversarial attacks and explores a method for their safeguarding.
CNNs were implemented on four of the most common image datasets and achieved high baseline accuracy.
It appears that while most level of robustness is achieved against the models after adversarial training, there are still a few losses in the performance of these models against adversarial perturbations.
arXiv Detail & Related papers (2025-01-31T22:32:38Z) - Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning [8.695511322757262]
Unauthorized use and illegal distribution of AI models pose serious threats to intellectual property.
Model watermarking has emerged as a key technique to address this issue.
This paper presents several contributions to model watermarking.
arXiv Detail & Related papers (2024-09-03T02:18:45Z) - Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples.
By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in
Deep Neural Networks [22.614495877481144]
State-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership.
We propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model.
arXiv Detail & Related papers (2021-06-18T14:23:55Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.