Near-Optimal Differentially Private Reinforcement Learning
- URL: http://arxiv.org/abs/2212.04680v1
- Date: Fri, 9 Dec 2022 06:03:02 GMT
- Title: Near-Optimal Differentially Private Reinforcement Learning
- Authors: Dan Qiao, Yu-Xiang Wang
- Abstract summary: We study online exploration in reinforcement learning with differential privacy constraints.
Existing work established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP)
We design an $epsilon$-JDP algorithm with a regret of $widetildeO(sqrtSAH2T+S2AH3/epsilon)$ which matches the information-theoretic lower bound of non-private learning.
- Score: 16.871660060209674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by personalized healthcare and other applications involving
sensitive data, we study online exploration in reinforcement learning with
differential privacy (DP) constraints. Existing work on this problem
established that no-regret learning is possible under joint differential
privacy (JDP) and local differential privacy (LDP) but did not provide an
algorithm with optimal regret. We close this gap for the JDP case by designing
an $\epsilon$-JDP algorithm with a regret of
$\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the
information-theoretic lower bound of non-private learning for all choices of
$\epsilon> S^{1.5}A^{0.5} H^2/\sqrt{T}$. In the above, $S$, $A$ denote the
number of states and actions, $H$ denotes the planning horizon, and $T$ is the
number of steps. To the best of our knowledge, this is the first private RL
algorithm that achieves \emph{privacy for free} asymptotically as $T\rightarrow
\infty$. Our techniques -- which could be of independent interest -- include
privately releasing Bernstein-type exploration bonuses and an improved method
for releasing visitation statistics. The same techniques also imply a slightly
improved regret bound for the LDP case.
Related papers
- Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z) - Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints.
We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z) - Differentially Private Temporal Difference Learning with Stochastic
Nonconvex-Strongly-Concave Optimization [17.361143427007224]
temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning.
In this paper, we consider preserving privacy in TD learning with a nonlinear value function.
We show that DPTD could provide $epsilon,n-differential privacy (DP) guarantee for sensitive information encoded in transitions and retain the original power of TD learning.
arXiv Detail & Related papers (2022-01-25T16:48:29Z) - Differentially Private Exploration in Reinforcement Learning with Linear
Representation [102.17246636801649]
We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.
We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a $widetildeO(sqrtK/epsilon)$ regret bound for $(epsilon,delta)
arXiv Detail & Related papers (2021-12-02T19:59:50Z) - Locally Differentially Private Reinforcement Learning for Linear Mixture
Markov Decision Processes [78.27542864367821]
Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data.
To protect the users' privacy, privacy-preserving RL algorithms are in demand.
We propose a novel $(varepsilon, delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs.
arXiv Detail & Related papers (2021-10-19T17:44:09Z) - Differential Privacy in Personalized Pricing with Nonparametric Demand
Models [15.036147440342338]
This paper studies a dynamic personalized pricing problem with textitunknown nonparametric demand models under data privacy protection.
Two concepts of data privacy, which have been widely applied in practices, are introduced.
We develop two algorithms which make pricing decisions and learn the unknown demand on the fly, while satisfying the CDP and LDP gurantees respectively.
arXiv Detail & Related papers (2021-09-10T01:56:55Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Local Differential Privacy for Regret Minimization in Reinforcement
Learning [33.679678503441565]
We study privacy in the context of finite-horizon Markov Decision Processes (MDPs)
We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework.
We present an optimistic algorithm that simultaneously satisfies $varepsilon$-LDP requirements.
arXiv Detail & Related papers (2020-10-15T14:13:26Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.