$(ε, δ)$-Differentially Private Partial Least Squares Regression
- URL: http://arxiv.org/abs/2412.09164v1
- Date: Thu, 12 Dec 2024 10:49:55 GMT
- Title: $(ε, δ)$-Differentially Private Partial Least Squares Regression
- Authors: Ramin Nikzad-Langerodi, Mohit Kumar, Du Nguyen Duy, Mahtab Alghasi,
- Abstract summary: We propose an $(epsilon, delta)$-differentially private PLS (edPLS) algorithm to ensure the privacy of the data underlying the model.
Experimental results demonstrate that edPLS effectively renders privacy attacks, aimed at recovering unique sources of variability in the training data.
- Score: 1.8666451604540077
- License:
- Abstract: As data-privacy requirements are becoming increasingly stringent and statistical models based on sensitive data are being deployed and used more routinely, protecting data-privacy becomes pivotal. Partial Least Squares (PLS) regression is the premier tool for building such models in analytical chemistry, yet it does not inherently provide privacy guarantees, leaving sensitive (training) data vulnerable to privacy attacks. To address this gap, we propose an $(\epsilon, \delta)$-differentially private PLS (edPLS) algorithm, which integrates well-studied and theoretically motivated Gaussian noise-adding mechanisms into the PLS algorithm to ensure the privacy of the data underlying the model. Our approach involves adding carefully calibrated Gaussian noise to the outputs of four key functions in the PLS algorithm: the weights, scores, $X$-loadings, and $Y$-loadings. The noise variance is determined based on the global sensitivity of each function, ensuring that the privacy loss is controlled according to the $(\epsilon, \delta)$-differential privacy framework. Specifically, we derive the sensitivity bounds for each function and use these bounds to calibrate the noise added to the model components. Experimental results demonstrate that edPLS effectively renders privacy attacks, aimed at recovering unique sources of variability in the training data, ineffective. Application of edPLS to the NIR corn benchmark dataset shows that the root mean squared error of prediction (RMSEP) remains competitive even at strong privacy levels (i.e., $\epsilon=1$), given proper pre-processing of the corresponding spectra. These findings highlight the practical utility of edPLS in creating privacy-preserving multivariate calibrations and for the analysis of their privacy-utility trade-offs.
Related papers
- Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.
Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.
We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Calibrating Practical Privacy Risks for Differentially Private Machine Learning [5.363664265121231]
We study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training.
We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility.
arXiv Detail & Related papers (2024-10-30T03:52:01Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Adaptive Differential Privacy in Federated Learning: A Priority-Based
Approach [0.0]
Federated learning (FL) develops global models without direct access to local datasets.
DP offers a framework that gives a privacy guarantee by adding certain amounts of noise to parameters.
We propose adaptive noise addition in FL which decides the value of injected noise based on features' relative importance.
arXiv Detail & Related papers (2024-01-04T03:01:15Z) - Stronger Privacy Amplification by Shuffling for R\'enyi and Approximate
Differential Privacy [43.33288245778629]
A key result in this model is that randomly shuffling locally randomized data amplifies differential privacy guarantees.
Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously.
In this work, we improve the state of the art privacy amplification by shuffling results both theoretically and numerically.
arXiv Detail & Related papers (2022-08-09T08:13:48Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - A Differentially Private Framework for Deep Learning with Convexified
Loss Functions [4.059849656394191]
Differential privacy (DP) has been applied in deep learning for preserving privacy of the underlying training sets.
Existing DP practice falls into three categories - objective perturbation, gradient perturbation and output perturbation.
We propose a novel output perturbation framework by injecting DP noise into a randomly sampled neuron.
arXiv Detail & Related papers (2022-04-03T11:10:05Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.