DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection
- URL: http://arxiv.org/abs/2403.17755v1
- Date: Tue, 26 Mar 2024 14:44:51 GMT
- Title: DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection
- Authors: Sihan Shang, Jiancheng Yang, Zhenglong Sun, Pascal Fua,
- Abstract summary: DataCook operates by "cooking" the raw data before distribution, enabling the development of models that perform normally on this processed data.
During the deployment phase, the original test data must be also "cooked" through DataCook to ensure normal model performance.
The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence.
- Score: 47.91906879320081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data during the deployment phase. DataCook operates by "cooking" the raw data before distribution, enabling the development of models that perform normally on this processed data. However, during the deployment phase, the original test data must be also "cooked" through DataCook to ensure normal model performance. This process grants copyright holders control over authorization during the deployment phase. The mechanism behind DataCook is by crafting anti-adversarial examples (AntiAdv), which are designed to enhance model confidence, as opposed to standard adversarial examples (Adv) that aim to confuse models. Similar to Adv, AntiAdv introduces imperceptible perturbations, ensuring that the data processed by DataCook remains easily understandable. We conducted extensive experiments on MedMNIST datasets, encompassing both 2D/3D data and the high-resolution variants. The outcomes indicate that DataCook effectively meets its objectives, preventing models trained on AntiAdv from analyzing unauthorized data effectively, without compromising the validity and accuracy of the data in legitimate scenarios. Code and data are available at https://github.com/MedMNIST/DataCook.
Related papers
- Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning [12.80649024603656]
This paper introduces data taggants, a novel non-backdoor dataset ownership verification technique.
We validate our approach through comprehensive and realistic experiments on ImageNet1k using ViT and ResNet models with state-of-the-art training recipes.
arXiv Detail & Related papers (2024-10-09T12:49:23Z) - CAP: Detecting Unauthorized Data Usage in Generative Models via Prompt Generation [1.6141139250981018]
Copyright Audit via Prompts generation (CAP) is a framework for automatically testing whether an ML model has been trained with unauthorized data.
Specifically, we devise an approach to generate suitable keys inducing the model to reveal copyrighted contents.
To prove its effectiveness, we conducted an extensive evaluation campaign on measurements collected in four IoT scenarios.
arXiv Detail & Related papers (2024-10-08T08:49:41Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - The Devil's Advocate: Shattering the Illusion of Unexploitable Data
using Diffusion Models [14.018862290487617]
We show that a carefully designed denoising process can counteract the data-protecting perturbations.
Our approach, called AVATAR, delivers state-of-the-art performance against a suite of recent availability attacks.
arXiv Detail & Related papers (2023-03-15T10:20:49Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - Distill and Fine-tune: Effective Adaptation from a Black-box Source
Model [138.12678159620248]
Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous related labeled datasets (source) to a new unlabeled dataset (target)
We propose a novel two-step adaptation framework called Distill and Fine-tune (Dis-tune)
arXiv Detail & Related papers (2021-04-04T05:29:05Z) - Self-Supervised Noisy Label Learning for Source-Free Unsupervised Domain
Adaptation [87.60688582088194]
We propose a novel Self-Supervised Noisy Label Learning method.
Our method can easily achieve state-of-the-art results and surpass other methods by a very large margin.
arXiv Detail & Related papers (2021-02-23T10:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.