Black-box Dataset Ownership Verification via Backdoor Watermarking
- URL: http://arxiv.org/abs/2209.06015v2
- Date: Fri, 31 Mar 2023 01:11:50 GMT
- Title: Black-box Dataset Ownership Verification via Backdoor Watermarking
- Authors: Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, Shu-Tao Xia
- Abstract summary: We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
- Score: 67.69308278379957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning, especially deep neural networks (DNNs), has been widely and
successfully adopted in many critical applications for its high effectiveness
and efficiency. The rapid development of DNNs has benefited from the existence
of some high-quality datasets ($e.g.$, ImageNet), which allow researchers and
developers to easily verify the performance of their methods. Currently, almost
all existing released datasets require that they can only be adopted for
academic or educational purposes rather than commercial purposes without
permission. However, there is still no good way to ensure that. In this paper,
we formulate the protection of released datasets as verifying whether they are
adopted for training a (suspicious) third-party model, where defenders can only
query the model while having no information about its parameters and training
details. Based on this formulation, we propose to embed external patterns via
backdoor watermarking for the ownership verification to protect them. Our
method contains two main parts, including dataset watermarking and dataset
verification. Specifically, we exploit poison-only backdoor attacks ($e.g.$,
BadNets) for dataset watermarking and design a hypothesis-test-guided method
for dataset verification. We also provide some theoretical analyses of our
methods. Experiments on multiple benchmark datasets of different tasks are
conducted, which verify the effectiveness of our method. The code for
reproducing main experiments is available at
\url{https://github.com/THUYimingLi/DVBW}.
Related papers
- Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning [12.80649024603656]
This paper introduces data taggants, a novel non-backdoor dataset ownership verification technique.
We validate our approach through comprehensive and realistic experiments on ImageNet1k using ViT and ResNet models with state-of-the-art training recipes.
arXiv Detail & Related papers (2024-10-09T12:49:23Z) - Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand [96.26251471253823]
backdoor-based dataset ownership verification (DOV) is currently the only feasible approach to protect the copyright of open-source datasets.
We make watermarked models (trained on the protected dataset) correctly classify some hard' samples that will be misclassified by the benign model.
arXiv Detail & Related papers (2023-10-09T11:23:05Z) - DAD++: Improved Data-free Test Time Adversarial Defense [12.606555446261668]
We propose a test time Data-free Adversarial Defense (DAD) containing detection and correction frameworks.
We conduct a wide range of experiments and ablations on several datasets and network architectures to show the efficacy of our proposed approach.
Our DAD++ gives an impressive performance against various adversarial attacks with a minimal drop in clean accuracy.
arXiv Detail & Related papers (2023-09-10T20:39:53Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset
Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic.
We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Open-sourced Dataset Protection via Backdoor Watermarking [87.15630326131901]
We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
arXiv Detail & Related papers (2020-10-12T16:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.