Accuracy Improvement in Differentially Private Logistic Regression: A
Pre-training Approach
- URL: http://arxiv.org/abs/2307.13771v3
- Date: Mon, 12 Feb 2024 10:49:28 GMT
- Title: Accuracy Improvement in Differentially Private Logistic Regression: A
Pre-training Approach
- Authors: Mohammad Hoseinpour, Milad Hoseinpour, Ali Aghagolzadeh
- Abstract summary: This paper aims to boost the accuracy of a DP logistic regression (LR) model via a pre-training module.
In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP-LR model.
- Score: 4.297070083645049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) models can memorize training datasets. As a result,
training ML models over private datasets can lead to the violation of
individuals' privacy. Differential privacy (DP) is a rigorous privacy notion to
preserve the privacy of underlying training datasets. Yet, training ML models
in a DP framework usually degrades the accuracy of ML models. This paper aims
to boost the accuracy of a DP logistic regression (LR) via a pre-training
module. In more detail, we initially pre-train our LR model on a public
training dataset that there is no privacy concern about it. Then, we fine-tune
our DP-LR model with the private dataset. In the numerical results, we show
that adding a pre-training module significantly improves the accuracy of the
DP-LR model.
Related papers
- Noise-Aware Differentially Private Regression via Meta-Learning [25.14514068630219]
Differential Privacy (DP) is the gold standard for protecting user privacy, but standard DP mechanisms significantly impair performance.
One approach to mitigating this issue is pre-training models on simulated data before DP learning on the private data.
In this work we go a step further, using simulated data to train a meta-learning model that combines the Convolutional Conditional Neural Process (ConvCNP) with an improved functional DP mechanism.
arXiv Detail & Related papers (2024-06-12T18:11:24Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - Pre-training Differentially Private Models with Limited Public Data [58.945400707033016]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We propose a novel DP continual pre-training strategy using only 10% of public data.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - PANORAMIA: Privacy Auditing of Machine Learning Models without
Retraining [2.6068944905108227]
We introduce a privacy auditing scheme for ML models that relies on membership inference attacks using generated data as "non-members"
This scheme, which we call PANORAMIA, quantifies the privacy leakage for large-scale ML models without control of the training process or model re-training.
arXiv Detail & Related papers (2024-02-12T22:56:07Z) - Accurate, Explainable, and Private Models: Providing Recourse While
Minimizing Training Data Leakage [10.921553888358375]
We present two novel methods to generate differentially private recourse.
We find that DPM and LR perform well in reducing what an adversary can infer.
arXiv Detail & Related papers (2023-08-08T15:38:55Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - An Efficient DP-SGD Mechanism for Large Scale NLP Models [28.180412581994485]
Data used to train Natural Language Understanding (NLU) models may contain private information such as addresses or phone numbers.
It is desirable that underlying models do not expose private information contained in the training data.
Differentially Private Gradient Descent (DP-SGD) has been proposed as a mechanism to build privacy-preserving models.
arXiv Detail & Related papers (2021-07-14T15:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.