Related papers: Measuring Data Leakage in Machine-Learning Models with Fisher Information

Measuring Data Leakage in Machine-Learning Models with Fisher Information

URL: http://arxiv.org/abs/2102.11673v1
Date: Tue, 23 Feb 2021 13:02:34 GMT
Title: Measuring Data Leakage in Machine-Learning Models with Fisher Information
Authors: Awni Hannun, Chuan Guo, Laurens van der Maaten
Abstract summary: Machine-learning models contain information about the data they were trained on. This information leaks either through the model itself or through predictions made by the model. We propose a method to quantify this leakage using the Fisher information of the model about the data.
Score: 35.20523017255285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine-learning models contain information about the data they were trained on. This information leaks either through the model itself or through predictions made by the model. Consequently, when the training data contains sensitive attributes, assessing the amount of information leakage is paramount. We propose a method to quantify this leakage using the Fisher information of the model about the data. Unlike the worst-case a priori guarantees of differential privacy, Fisher information loss measures leakage with respect to specific examples, attributes, or sub-populations within the dataset. We motivate Fisher information loss through the Cram\'{e}r-Rao bound and delineate the implied threat model. We provide efficient methods to compute Fisher information loss for output-perturbed generalized linear models. Finally, we empirically validate Fisher information loss as a useful measure of information leakage.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
FisherMask: Enhancing Neural Network Labeling Efficiency in Image Classification Using Fisher Information [2.762397703396293]
FisherMask is a Fisher information-based active learning (AL) approach that identifies key network parameters by masking them. Our experiments demonstrate that FisherMask significantly outperforms state-of-the-art methods on diverse datasets.
arXiv Detail & Related papers (2024-11-08T18:10:46Z)
Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective [4.31734012105466]
Machine Unlearning is the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model. We propose a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network. Our novel approach, termed textbfPartially-Blinded Unlearning (PBU), surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness.
arXiv Detail & Related papers (2024-03-24T17:33:22Z)
Loss-Free Machine Unlearning [51.34904967046097]
We present a machine unlearning approach that is both retraining- and label-free. Retraining-free approaches often utilise Fisher information, which is derived from the loss and requires labelled data which may not be available. We present an extension to the Selective Synaptic Dampening algorithm, substituting the diagonal of the Fisher information matrix for the gradient of the l2 norm of the model output to approximate sensitivity.
arXiv Detail & Related papers (2024-02-29T16:15:34Z)
Leave-one-out Distinguishability in Machine Learning [23.475469946428717]
We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set. This is key to measuring data **memorization** and information **leakage** as well as the **influence** of training data points in machine learning.
arXiv Detail & Related papers (2023-09-29T15:08:28Z)
On the Exploitability of Instruction Tuning [103.8077787502381]
In this work, we investigate how an adversary can exploit instruction tuning to change a model's behavior. We propose textitAutoPoison, an automated data poisoning pipeline. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data.
arXiv Detail & Related papers (2023-06-28T17:54:04Z)
Unifying Approaches in Data Subset Selection via Fisher Information and Information-Theoretic Quantities [38.59619544501593]
We revisit the Fisher information and use it to show how several otherwise disparate methods are connected as approximations of information-theoretic quantities. In data subset selection, i.e. active learning and active sampling, several recent works use Fisher information, Hessians, similarity matrices based on the gradients, or simply the gradient lengths to compute the acquisition scores that guide sample selection.
arXiv Detail & Related papers (2022-08-01T00:36:57Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
Modelling and Quantifying Membership Information Leakage in Machine Learning [14.095523601311374]
We show that complex models, such as deep neural networks, are more susceptible to membership inference attacks. We show that the amount of the membership information leakage is reduced by $mathcalO(log1/2(delta-1)epsilon-1)$ when using Gaussian $(epsilon,delta)$-differentially-private additive noises.
arXiv Detail & Related papers (2020-01-29T00:42:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.