Differentially Private Temporal Difference Learning with Stochastic
Nonconvex-Strongly-Concave Optimization
- URL: http://arxiv.org/abs/2201.10447v1
- Date: Tue, 25 Jan 2022 16:48:29 GMT
- Title: Differentially Private Temporal Difference Learning with Stochastic
Nonconvex-Strongly-Concave Optimization
- Authors: Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
- Abstract summary: temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.
In this paper, we consider preserving privacy in TD learning with a nonlinear value function.
We show that DPTD could provide $epsilon,n-differential privacy (DP) guarantee for sensitive information encoded in transitions and retain the original power of TD learning.
- Score: 17.361143427007224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal difference (TD) learning is a widely used method to evaluate
policies in reinforcement learning. While many TD learning methods have been
developed in recent years, little attention has been paid to preserving privacy
and most of the existing approaches might face the concerns of data privacy
from users. To enable complex representative abilities of policies, in this
paper, we consider preserving privacy in TD learning with nonlinear value
function approximation. This is challenging because such a nonlinear problem is
usually studied in the formulation of stochastic nonconvex-strongly-concave
optimization to gain finite-sample analysis, which would require simultaneously
preserving the privacy on primal and dual sides. To this end, we employ a
momentum-based stochastic gradient descent ascent to achieve a single-timescale
algorithm, and achieve a good trade-off between meaningful privacy and utility
guarantees of both the primal and dual sides by perturbing the gradients on
both sides using well-calibrated Gaussian noises. As a result, our DPTD
algorithm could provide $(\epsilon,\delta)$-differential privacy (DP) guarantee
for the sensitive information encoded in transitions and retain the original
power of TD learning, with the utility upper bounded by
$\widetilde{\mathcal{O}}(\frac{(d\log(1/\delta))^{1/8}}{(n\epsilon)^{1/4}})$
(The tilde in this paper hides the log factor.), where $n$ is the trajectory
length and $d$ is the dimension. Extensive experiments conducted in OpenAI Gym
show the advantages of our proposed algorithm.
Related papers
- Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.
There is no differentially private distributed method for training, non-feedback problems.
We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.
Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.
We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Privacy for Free in the Over-Parameterized Regime [19.261178173399784]
Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data.
In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large $p$, privacy can be obtained for free, i.e., $left|R_P right| = o(1)$, not only when the privacy parameter $varepsilon$ has constant order, but also in the strongly private setting $varepsilon = o(1)$.
arXiv Detail & Related papers (2024-10-18T18:01:11Z) - Directional Privacy for Deep Learning [2.826489388853448]
Differentially Private Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models.
Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility.
We show that this provides both $epsilon$-DP and $epsilon d$-privacy for deep learning training, rather than the $(epsilon, delta)$-privacy of the Gaussian mechanism.
arXiv Detail & Related papers (2022-11-09T05:18:08Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Differentially Private Federated Learning via Inexact ADMM [0.0]
Differential privacy (DP) techniques can be applied to the federated learning model to protect data privacy against inference attacks.
We develop a DP inexact alternating direction method of multipliers algorithm that solves a sequence of trust-region subproblems.
Our algorithm reduces the testing error by at most $22%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2021-06-11T02:28:07Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.