Related papers: Tools for Verifying Neural Models' Training Data

Tools for Verifying Neural Models' Training Data

URL: http://arxiv.org/abs/2307.00682v1
Date: Sun, 2 Jul 2023 23:27:00 GMT
Title: Tools for Verifying Neural Models' Training Data
Authors: Dami Choi, Yonadav Shavit, David Duvenaud
Abstract summary: "Proof-of-Training-Data" allows a model trainer to convince a Verifier of the training data that produced a set of model weights. We show experimentally that our verification procedures can catch a wide variety of attacks.
Score: 29.322899317216407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is important that consumers and regulators can verify the provenance of large neural models to evaluate their capabilities and risks. We introduce the concept of a "Proof-of-Training-Data": any protocol that allows a model trainer to convince a Verifier of the training data that produced a set of model weights. Such protocols could verify the amount and kind of data and compute used to train the model, including whether it was trained on specific harmful or beneficial data sources. We explore efficient verification strategies for Proof-of-Training-Data that are compatible with most current large-model training procedures. These include a method for the model-trainer to verifiably pre-commit to a random seed used in training, and a method that exploits models' tendency to temporarily overfit to training data in order to detect whether a given data-point was included in training. We show experimentally that our verification procedures can catch a wide variety of attacks, including all known attacks from the Proof-of-Learning literature.

Related papers

Learning to Weight Parameters for Data Attribution [63.753710512888965]
We study data attribution in generative models, aiming to identify which training examples most influence a given output.<n>We propose a method that models this by learning parameter importance weights tailored for attribution, without requiring labeled data.
arXiv Detail & Related papers (2025-06-06T00:32:04Z)
Early Stopping Against Label Noise Without Validation Data [54.27621957395026]
We propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model. We show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.
arXiv Detail & Related papers (2025-02-11T13:40:15Z)
Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine? [17.714589429503675]
We propose an injection-free training data attribution method for text-to-image models. Our approach involves developing algorithms to uncover distinct samples and using them as inherent watermarks. Our experiments demonstrate that our method achieves an accuracy of over 80% in identifying the source of a suspicious model's training data.
arXiv Detail & Related papers (2024-09-24T06:23:43Z)
Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z)
TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z)
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model. We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z)
Targeted Image Reconstruction by Sampling Pre-trained Diffusion Model [0.0]
A trained neural network model contains information on the training data. Malicious parties can leverage the "knowledge" in this model and design ways to print out any usable information. In this work, we proposed ways to generate a data point of the target class without prior knowledge of the exact target distribution.
arXiv Detail & Related papers (2023-01-18T14:21:38Z)
Provable Fairness for Neural Network Models using Formal Verification [10.90121002896312]
We propose techniques to emphprove fairness using recently developed formal methods that verify properties of neural network models. We show that through proper training, we can reduce unfairness by an average of 65.4% at a cost of less than 1% in AUC score.
arXiv Detail & Related papers (2022-12-16T16:54:37Z)
Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments. We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z)
Leveraging Adversarial Examples to Quantify Membership Information Leakage [30.55736840515317]
We develop a novel approach to address the problem of membership inference in pattern recognition models. We argue that this quantity reflects the likelihood of belonging to the training data. Our method performs comparable or even outperforms state-of-the-art strategies.
arXiv Detail & Related papers (2022-03-17T19:09:38Z)
Conformal Prediction Under Feedback Covariate Shift for Biomolecular Design [56.86533144730384]
We introduce a method to quantify predictive uncertainty in settings where the training and test data are statistically dependent. As a motivating use case, we demonstrate with several real data sets how our method quantifies uncertainty for the predicted fitness of designed proteins.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Learning to be a Statistician: Learned Estimator for Number of Distinct Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems. In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples. We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z)
Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model. We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.