Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand
- URL: http://arxiv.org/abs/2310.14942v2
- Date: Sun, 5 Nov 2023 01:50:39 GMT
- Title: Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand
- Authors: Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu,
Bo Li
- Abstract summary: backdoor-based dataset ownership verification (DOV) is currently the only feasible approach to protect the copyright of open-source datasets.
We make watermarked models (trained on the protected dataset) correctly classify some hard' samples that will be misclassified by the benign model.
- Score: 96.26251471253823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prosperity of deep neural networks (DNNs) is largely benefited from
open-source datasets, based on which users can evaluate and improve their
methods. In this paper, we revisit backdoor-based dataset ownership
verification (DOV), which is currently the only feasible approach to protect
the copyright of open-source datasets. We reveal that these methods are
fundamentally harmful given that they could introduce malicious
misclassification behaviors to watermarked DNNs by the adversaries. In this
paper, we design DOV from another perspective by making watermarked models
(trained on the protected dataset) correctly classify some `hard' samples that
will be misclassified by the benign model. Our method is inspired by the
generalization property of DNNs, where we find a \emph{hardly-generalized
domain} for the original dataset (as its \emph{domain watermark}). It can be
easily learned with the protected dataset containing modified samples.
Specifically, we formulate the domain generation as a bi-level optimization and
propose to optimize a set of visually-indistinguishable clean-label modified
data with similar effects to domain-watermarked samples from the
hardly-generalized domain to ensure watermark stealthiness. We also design a
hypothesis-test-guided ownership verification via our domain watermark and
provide the theoretical analyses of our method. Extensive experiments on three
benchmark datasets are conducted, which verify the effectiveness of our method
and its resistance to potential adaptive methods. The code for reproducing main
experiments is available at
\url{https://github.com/JunfengGo/Domain-Watermark}.
Related papers
- Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset
Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic.
We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - A Free Lunch for Unsupervised Domain Adaptive Object Detection without
Source Data [69.091485888121]
Unsupervised domain adaptation assumes that source and target domain data are freely available and usually trained together to reduce the domain gap.
We propose a source data-free domain adaptive object detection (SFOD) framework via modeling it into a problem of learning with noisy labels.
arXiv Detail & Related papers (2020-12-10T01:42:35Z) - Open-sourced Dataset Protection via Backdoor Watermarking [87.15630326131901]
We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
arXiv Detail & Related papers (2020-10-12T16:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.