Tools for Verifying Neural Models' Training Data
- URL: http://arxiv.org/abs/2307.00682v1
- Date: Sun, 2 Jul 2023 23:27:00 GMT
- Title: Tools for Verifying Neural Models' Training Data
- Authors: Dami Choi, Yonadav Shavit, David Duvenaud
- Abstract summary: "Proof-of-Training-Data" allows a model trainer to convince a Verifier of the training data that produced a set of model weights.
We show experimentally that our verification procedures can catch a wide variety of attacks.
- Score: 29.322899317216407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is important that consumers and regulators can verify the provenance of
large neural models to evaluate their capabilities and risks. We introduce the
concept of a "Proof-of-Training-Data": any protocol that allows a model trainer
to convince a Verifier of the training data that produced a set of model
weights. Such protocols could verify the amount and kind of data and compute
used to train the model, including whether it was trained on specific harmful
or beneficial data sources. We explore efficient verification strategies for
Proof-of-Training-Data that are compatible with most current large-model
training procedures. These include a method for the model-trainer to verifiably
pre-commit to a random seed used in training, and a method that exploits
models' tendency to temporarily overfit to training data in order to detect
whether a given data-point was included in training. We show experimentally
that our verification procedures can catch a wide variety of attacks, including
all known attacks from the Proof-of-Learning literature.
Related papers
- Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine? [17.714589429503675]
We propose an injection-free training data attribution method for text-to-image models.
Our approach involves developing algorithms to uncover distinct samples and using them as inherent watermarks.
Our experiments demonstrate that our method achieves an accuracy of over 80% in identifying the source of a suspicious model's training data.
arXiv Detail & Related papers (2024-09-24T06:23:43Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Targeted Image Reconstruction by Sampling Pre-trained Diffusion Model [0.0]
A trained neural network model contains information on the training data.
Malicious parties can leverage the "knowledge" in this model and design ways to print out any usable information.
In this work, we proposed ways to generate a data point of the target class without prior knowledge of the exact target distribution.
arXiv Detail & Related papers (2023-01-18T14:21:38Z) - Provable Fairness for Neural Network Models using Formal Verification [10.90121002896312]
We propose techniques to emphprove fairness using recently developed formal methods that verify properties of neural network models.
We show that through proper training, we can reduce unfairness by an average of 65.4% at a cost of less than 1% in AUC score.
arXiv Detail & Related papers (2022-12-16T16:54:37Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated
Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments.
We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z) - Leveraging Adversarial Examples to Quantify Membership Information
Leakage [30.55736840515317]
We develop a novel approach to address the problem of membership inference in pattern recognition models.
We argue that this quantity reflects the likelihood of belonging to the training data.
Our method performs comparable or even outperforms state-of-the-art strategies.
arXiv Detail & Related papers (2022-03-17T19:09:38Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.