Related papers: Probabilistically Robust Watermarking of Neural Networks

Probabilistically Robust Watermarking of Neural Networks

URL: http://arxiv.org/abs/2401.08261v2
Date: Wed, 18 Sep 2024 16:50:32 GMT
Title: Probabilistically Robust Watermarking of Neural Networks
Authors: Mikhail Pautov, Nikita Bogdanov, Stanislav Pyatkin, Oleg Rogov, Ivan Oseledets,
Abstract summary: We introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks. Our approach does not require additional model training and can be applied to any model architecture.
Score: 4.332441337407564
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation. Our approach does not require additional model training and can be applied to any model architecture. The key idea of our method is to compute the trigger set, which is transferable between the source model and the set of proxy models with a high probability. In our experimental study, we show that if the probability of the set being transferable is reasonably high, it can be effectively used for ownership verification of the stolen model. We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups.

Related papers

Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models [50.73220224678009]
Watermarking can be used to verify the origin of synthetic images generated by artificial intelligence models.<n>Recent studies demonstrate the capability to forge watermarks from a target image onto cover images via adversarial techniques.<n>In this paper, we uncover a greater risk of an optimization-free and universal watermark forgery.<n>Our approach significantly broadens the scope of attacks, presenting a greater challenge to the security of current watermarking techniques.
arXiv Detail & Related papers (2025-06-06T12:08:02Z)
Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks [39.06642008591216]
We propose Neural Honeytrace, a robust plug-and-play watermarking framework against model extraction attacks. Neural Honeytrace reduces the average number of samples required for a worst-case t-Test-based copyright claim from $12,000$ to $200$ with zero training cost.
arXiv Detail & Related papers (2025-01-16T06:59:20Z)
Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers [11.089926858383476]
We propose model watermarking via passthrough layers, which are added to existing pre-trained networks. Our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks. We show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks.
arXiv Detail & Related papers (2024-12-17T05:46:50Z)
Watermarking Recommender Systems [52.207721219147814]
We introduce Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems. Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores. To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence.
arXiv Detail & Related papers (2024-07-17T06:51:24Z)
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks. adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation. Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z)
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data [10.564634073196117]
We introduce a novel watermarking technique based on Multi-view dATa, called MAT, for efficiently embedding watermarks within DNNs. We validate our method across various benchmarks and demonstrate its efficacy in defending against model extraction attacks.
arXiv Detail & Related papers (2024-03-15T20:12:41Z)
Trojan Model Detection Using Activation Optimization [15.032071953322594]
Training machine learning models can be very expensive or even unaffordable. Pre-trained models can be infected with Trojan attacks. We present a novel method for detecting Trojan models.
arXiv Detail & Related papers (2023-06-08T02:17:29Z)
Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models [44.80560808267494]
We present an adaptive framework to watermark a protected model, leveraging the unique behavior present in the model. This watermark is used to detect extracted models, which have the same unique behavior, indicating an unauthorized usage of the protected model's IP. We show that the framework is robust to (1) unseen model extraction attacks, and (2) extracted models which undergo a method (e.g., weight pruning)
arXiv Detail & Related papers (2022-11-24T14:48:40Z)
Watermarking for Out-of-distribution Detection [76.20630986010114]
Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models. We propose a general methodology named watermarking in this paper. We learn a unified pattern that is superimposed onto features of original data, and the model's detection capability is largely boosted after watermarking.
arXiv Detail & Related papers (2022-10-27T06:12:32Z)
DynaMarks: Defending Against Deep Learning Model Extraction Using Dynamic Watermarking [3.282282297279473]
The functionality of a deep learning (DL) model can be stolen via model extraction. We propose a novel watermarking technique called DynaMarks to protect the intellectual property (IP) of DL models.
arXiv Detail & Related papers (2022-07-27T06:49:39Z)
Defending against Model Stealing via Verifying Embedded External Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features. Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z)
Deep Model Intellectual Property Protection via Deep Watermarking [122.87871873450014]
Deep neural networks are exposed to serious IP infringement risks. Given a target deep model, if the attacker knows its full information, it can be easily stolen by fine-tuning. We propose a new model watermarking framework for protecting deep networks trained for low-level computer vision or image processing tasks.
arXiv Detail & Related papers (2021-03-08T18:58:21Z)
Model Watermarking for Image Processing Networks [120.918532981871]
How to protect the intellectual property of deep models is a very important but seriously under-researched problem. We propose the first model watermarking framework for protecting image processing models.
arXiv Detail & Related papers (2020-02-25T18:36:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.