Data Isotopes for Data Provenance in DNNs
- URL: http://arxiv.org/abs/2208.13893v1
- Date: Mon, 29 Aug 2022 21:28:35 GMT
- Title: Data Isotopes for Data Provenance in DNNs
- Authors: Emily Wenger and Xiuyu Li and Ben Y. Zhao and Vitaly Shmatikov
- Abstract summary: We show how users can create special data points we call isotopes, which introduce "spurious features" into DNNs during training.
A user can apply statistical hypothesis testing to detect if a model has learned the spurious features associated with their isotopes by training on the user's data.
Our results confirm efficacy in multiple settings, detecting and distinguishing between hundreds of isotopes with high accuracy.
- Score: 27.549744883427376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Today, creators of data-hungry deep neural networks (DNNs) scour the Internet
for training fodder, leaving users with little control over or knowledge of
when their data is appropriated for model training. To empower users to
counteract unwanted data use, we design, implement and evaluate a practical
system that enables users to detect if their data was used to train an DNN
model. We show how users can create special data points we call isotopes, which
introduce "spurious features" into DNNs during training. With only query access
to a trained model and no knowledge of the model training process, or control
of the data labels, a user can apply statistical hypothesis testing to detect
if a model has learned the spurious features associated with their isotopes by
training on the user's data. This effectively turns DNNs' vulnerability to
memorization and spurious correlations into a tool for data provenance. Our
results confirm efficacy in multiple settings, detecting and distinguishing
between hundreds of isotopes with high accuracy. We further show that our
system works on public ML-as-a-service platforms and larger models such as
ImageNet, can use physical objects instead of digital marks, and remains
generally robust against several adaptive countermeasures.
Related papers
- Machine Unlearning using Forgetting Neural Networks [0.0]
This paper presents a new approach to machine unlearning using forgetting neural networks (FNN)
FNNs are neural networks with specific forgetting layers, that take inspiration from the processes involved when a human brain forgets.
We report our results on the MNIST handwritten digit recognition and fashion datasets.
arXiv Detail & Related papers (2024-10-29T02:52:26Z) - GOODAT: Towards Test-time Graph Out-of-Distribution Detection [103.40396427724667]
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains.
Recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN.
This paper introduces a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture.
arXiv Detail & Related papers (2024-01-10T08:37:39Z) - GraphGuard: Detecting and Counteracting Training Data Misuse in Graph
Neural Networks [69.97213941893351]
The emergence of Graph Neural Networks (GNNs) in graph data analysis has raised critical concerns about data misuse during model training.
Existing methodologies address either data misuse detection or mitigation, and are primarily designed for local GNN models.
This paper introduces a pioneering approach called GraphGuard, to tackle these challenges.
arXiv Detail & Related papers (2023-12-13T02:59:37Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying [10.919336198760808]
We introduce a novel methodology to detect leaked data that are used to train classification models.
textscLDSS involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset.
This enables the effective identification of models trained on leaked data through model querying alone.
arXiv Detail & Related papers (2023-10-06T10:36:28Z) - Efficient Testing of Deep Neural Networks via Decision Boundary Analysis [28.868479656437145]
We propose a novel technique, named Aries, that can estimate the performance of DNNs on new unlabeled data.
The estimated accuracy by Aries is only 0.03% -- 2.60% (on average 0.61%) off the true accuracy.
arXiv Detail & Related papers (2022-07-22T08:39:10Z) - Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - IADA: Iterative Adversarial Data Augmentation Using Formal Verification
and Expert Guidance [1.599072005190786]
We propose an iterative adversarial data augmentation framework to learn neural network models.
The proposed framework is applied to an artificial 2D dataset, the MNIST dataset, and a human motion dataset.
We show that our training method can improve the robustness and accuracy of the learned model.
arXiv Detail & Related papers (2021-08-16T03:05:53Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Self-Competitive Neural Networks [0.0]
Deep Neural Networks (DNNs) have improved the accuracy of classification problems in lots of applications.
One of the challenges in training a DNN is its need to be fed by an enriched dataset to increase its accuracy and avoid it suffering from overfitting.
Recently, researchers have worked extensively to propose methods for data augmentation.
In this paper, we generate adversarial samples to refine the Domains of Attraction (DoAs) of each class. In this approach, at each stage, we use the model learned by the primary and generated adversarial data (up to that stage) to manipulate the primary data in a way that look complicated to
arXiv Detail & Related papers (2020-08-22T12:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.