Open-sourced Dataset Protection via Backdoor Watermarking
- URL: http://arxiv.org/abs/2010.05821v3
- Date: Thu, 19 Nov 2020 04:51:13 GMT
- Title: Open-sourced Dataset Protection via Backdoor Watermarking
- Authors: Yiming Li, Ziqi Zhang, Jiawang Bai, Baoyuan Wu, Yong Jiang, Shu-Tao
Xia
- Abstract summary: We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
- Score: 87.15630326131901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of deep learning has benefited from the release of some
high-quality open-sourced datasets ($e.g.$, ImageNet), which allows researchers
to easily verify the effectiveness of their algorithms. Almost all existing
open-sourced datasets require that they can only be adopted for academic or
educational purposes rather than commercial purposes, whereas there is still no
good way to protect them. In this paper, we propose a \emph{backdoor embedding
based dataset watermarking} method to protect an open-sourced
image-classification dataset by verifying whether it is used for training a
third-party model. Specifically, the proposed method contains two main
processes, including \emph{dataset watermarking} and \emph{dataset
verification}. We adopt classical poisoning-based backdoor attacks ($e.g.$,
BadNets) for dataset watermarking, ie, generating some poisoned samples by
adding a certain trigger ($e.g.$, a local patch) onto some benign samples,
labeled with a pre-defined target class. Based on the proposed backdoor-based
watermarking, we use a hypothesis test guided method for dataset verification
based on the posterior probability generated by the suspicious third-party
model of the benign samples and their correspondingly watermarked samples
($i.e.$, images with trigger) on the target class. Experiments on some
benchmark datasets are conducted, which verify the effectiveness of the
proposed method.
Related papers
- Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning [12.80649024603656]
This paper introduces data taggants, a novel non-backdoor dataset ownership verification technique.
We validate our approach through comprehensive and realistic experiments on ImageNet1k using ViT and ResNet models with state-of-the-art training recipes.
arXiv Detail & Related papers (2024-10-09T12:49:23Z) - PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark [20.746346834429925]
We propose a clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness.
We perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns.
As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior.
arXiv Detail & Related papers (2024-08-10T09:31:58Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand [96.26251471253823]
backdoor-based dataset ownership verification (DOV) is currently the only feasible approach to protect the copyright of open-source datasets.
We make watermarked models (trained on the protected dataset) correctly classify some hard' samples that will be misclassified by the benign model.
arXiv Detail & Related papers (2023-10-09T11:23:05Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset
Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic.
We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.