Open-sourced Dataset Protection via Backdoor Watermarking
- URL: http://arxiv.org/abs/2010.05821v3
- Date: Thu, 19 Nov 2020 04:51:13 GMT
- Title: Open-sourced Dataset Protection via Backdoor Watermarking
- Authors: Yiming Li, Ziqi Zhang, Jiawang Bai, Baoyuan Wu, Yong Jiang, Shu-Tao
Xia
- Abstract summary: We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
- Score: 87.15630326131901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of deep learning has benefited from the release of some
high-quality open-sourced datasets ($e.g.$, ImageNet), which allows researchers
to easily verify the effectiveness of their algorithms. Almost all existing
open-sourced datasets require that they can only be adopted for academic or
educational purposes rather than commercial purposes, whereas there is still no
good way to protect them. In this paper, we propose a \emph{backdoor embedding
based dataset watermarking} method to protect an open-sourced
image-classification dataset by verifying whether it is used for training a
third-party model. Specifically, the proposed method contains two main
processes, including \emph{dataset watermarking} and \emph{dataset
verification}. We adopt classical poisoning-based backdoor attacks ($e.g.$,
BadNets) for dataset watermarking, ie, generating some poisoned samples by
adding a certain trigger ($e.g.$, a local patch) onto some benign samples,
labeled with a pre-defined target class. Based on the proposed backdoor-based
watermarking, we use a hypothesis test guided method for dataset verification
based on the posterior probability generated by the suspicious third-party
model of the benign samples and their correspondingly watermarked samples
($i.e.$, images with trigger) on the target class. Experiments on some
benchmark datasets are conducted, which verify the effectiveness of the
proposed method.
Related papers
- Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand [96.26251471253823]
backdoor-based dataset ownership verification (DOV) is currently the only feasible approach to protect the copyright of open-source datasets.
We make watermarked models (trained on the protected dataset) correctly classify some hard' samples that will be misclassified by the benign model.
arXiv Detail & Related papers (2023-10-09T11:23:05Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset
Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic.
We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - On the Effectiveness of Dataset Watermarking in Adversarial Settings [14.095584034871658]
We investigate a proposed data provenance method, radioactive data, to assess if it can be used to demonstrate ownership of (image) datasets used to train machine learning (ML) models.
We show that radioactive data can effectively survive model extraction attacks, which raises the possibility that it can be used for ML model ownership verification robust against model extraction.
arXiv Detail & Related papers (2022-02-25T05:51:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.