Differentially Private Coordinate Descent for Composite Empirical Risk
Minimization
- URL: http://arxiv.org/abs/2110.11688v1
- Date: Fri, 22 Oct 2021 10:22:48 GMT
- Title: Differentially Private Coordinate Descent for Composite Empirical Risk
Minimization
- Authors: Paul Mangold, Aur\'elien Bellet, Joseph Salmon, Marc Tommasi
- Abstract summary: Machine learning models can leak information about the data used to train them.
Differentially Private (DP) variants of optimization algorithms like Gradient Descent (DP-SGD) have been designed to mitigate this.
We propose a new method for composite Differentially Private Empirical Risk Minimization (DP-ERM): Differentially Private Coordinate Descent (DP-CD)
- Score: 13.742100810492014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models can leak information about the data used to train
them. Differentially Private (DP) variants of optimization algorithms like
Stochastic Gradient Descent (DP-SGD) have been designed to mitigate this,
inducing a trade-off between privacy and utility. In this paper, we propose a
new method for composite Differentially Private Empirical Risk Minimization
(DP-ERM): Differentially Private proximal Coordinate Descent (DP-CD). We
analyze its utility through a novel theoretical analysis of inexact coordinate
descent, and highlight some regimes where DP-CD outperforms DP-SGD, thanks to
the possibility of using larger step sizes. We also prove new lower bounds for
composite DP-ERM under coordinate-wise regularity assumptions, that are, in
some settings, nearly matched by our algorithm. In practical implementations,
the coordinate-wise nature of DP-CD updates demands special care in choosing
the clipping thresholds used to bound individual contributions to the
gradients. A natural parameterization of these thresholds emerges from our
theory, limiting the addition of unnecessarily large noise without requiring
coordinate-wise hyperparameter tuning or extra computational cost.
Related papers
- On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)
We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Differentially Private Random Block Coordinate Descent [51.62669821275571]
We propose a differentially private random coordinate descent method that selects multiple coordinates with varying probabilities in each iteration using sketch matrices.
Our algorithm generalizes both DP-CD and the classical DP-SGD (Differentially Private Descent), while preserving the same utility guarantees.
arXiv Detail & Related papers (2024-12-22T15:06:56Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Practical Differentially Private Hyperparameter Tuning with Subsampling [8.022555128083026]
We propose a new class of differentially private (DP) machine learning (ML) algorithms, where the number of random search samples is randomized itself.
We focus on lowering both the DP bounds and the computational cost of these methods by using only a random subset of the sensitive data.
We provide a R'enyi differential privacy analysis for the proposed method and experimentally show that it consistently leads to better privacy-utility trade-off.
arXiv Detail & Related papers (2023-01-27T21:01:58Z) - Differentially Private Learning with Per-Sample Adaptive Clipping [8.401653565794353]
We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
arXiv Detail & Related papers (2022-12-01T07:26:49Z) - High-Dimensional Private Empirical Risk Minimization by Greedy
Coordinate Descent [11.49109939095326]
We study differentially private empirical risk minimization (DP-ERM)
We show theoretically that DP-GCD can achieve a logarithmic dependence on the dimension for a wide range of problems.
arXiv Detail & Related papers (2022-07-04T16:27:00Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Automatic Clipping: Differentially Private Deep Learning Made Easier and
Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models.
We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.