Parameters or Privacy: A Provable Tradeoff Between Overparameterization
and Membership Inference
- URL: http://arxiv.org/abs/2202.01243v1
- Date: Wed, 2 Feb 2022 19:00:21 GMT
- Title: Parameters or Privacy: A Provable Tradeoff Between Overparameterization
and Membership Inference
- Authors: Jasper Tan, Blake Mason, Hamid Javadi, Richard G. Baraniuk
- Abstract summary: Over parameterized models generalize well (small error on the test data) even when trained to memorize the training data (zero error on the training data)
This has led to an arms race towards increasingly over parameterized models (c.f., deep learning)
- Score: 29.743945643424553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A surprising phenomenon in modern machine learning is the ability of a highly
overparameterized model to generalize well (small error on the test data) even
when it is trained to memorize the training data (zero error on the training
data). This has led to an arms race towards increasingly overparameterized
models (c.f., deep learning). In this paper, we study an underexplored hidden
cost of overparameterization: the fact that overparameterized models are more
vulnerable to privacy attacks, in particular the membership inference attack
that predicts the (potentially sensitive) examples used to train a model. We
significantly extend the relatively few empirical results on this problem by
theoretically proving for an overparameterized linear regression model with
Gaussian data that the membership inference vulnerability increases with the
number of parameters. Moreover, a range of empirical studies indicates that
more complex, nonlinear models exhibit the same behavior. Finally, we study
different methods for mitigating such attacks in the overparameterized regime,
such as noise addition and regularization, and conclude that simply reducing
the parameters of an overparameterized model is an effective strategy to
protect it from membership inference without greatly decreasing its
generalization error.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Better Membership Inference Privacy Measurement through Discrepancy [25.48677069802298]
We propose a new empirical privacy metric that is an upper bound on the advantage of a family of membership inference attacks.
We show that this metric does not involve training multiple models, can be applied to large Imagenet classification models in-the-wild, and has higher advantage than existing metrics on models trained with more recent and sophisticated training recipes.
arXiv Detail & Related papers (2024-05-24T01:33:22Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - A Blessing of Dimensionality in Membership Inference through
Regularization [29.08230123469755]
We show how the number of parameters of a model can induce a privacy--utility trade-off.
We then show that if coupled with proper generalization regularization, increasing the number of parameters of a model can actually increase both its privacy and performance.
arXiv Detail & Related papers (2022-05-27T15:44:00Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.