Related papers: Near-Optimal Differentially Private Reinforcement Learning

Near-Optimal Differentially Private Reinforcement Learning

URL: http://arxiv.org/abs/2212.04680v1
Date: Fri, 9 Dec 2022 06:03:02 GMT
Title: Near-Optimal Differentially Private Reinforcement Learning
Authors: Dan Qiao, Yu-Xiang Wang
Abstract summary: We study online exploration in reinforcement learning with differential privacy constraints. Existing work established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) We design an $epsilon$-JDP algorithm with a regret of $widetildeO(sqrtSAH2T+S2AH3/epsilon)$ which matches the information-theoretic lower bound of non-private learning.
Score: 16.871660060209674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by personalized healthcare and other applications involving sensitive data, we study online exploration in reinforcement learning with differential privacy (DP) constraints. Existing work on this problem established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) but did not provide an algorithm with optimal regret. We close this gap for the JDP case by designing an $\epsilon$-JDP algorithm with a regret of $\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the information-theoretic lower bound of non-private learning for all choices of $\epsilon> S^{1.5}A^{0.5} H^2/\sqrt{T}$. In the above, $S$, $A$ denote the number of states and actions, $H$ denotes the planning horizon, and $T$ is the number of steps. To the best of our knowledge, this is the first private RL algorithm that achieves \emph{privacy for free} asymptotically as $T\rightarrow \infty$. Our techniques -- which could be of independent interest -- include privately releasing Bernstein-type exploration bonuses and an improved method for releasing visitation statistics. The same techniques also imply a slightly improved regret bound for the LDP case.

Related papers

Towards Optimal Differentially Private Regret Bounds in Linear MDPs [0.0]
We design a new differentially private algorithm by privatizing LSVI-UCB++ and adapting techniques for variance-aware analysis from offline RL. Our algorithm achieves a regret bound of $widetildeO(d sqrtH3 K + H15/4 d7/6 K1/2 / epsilon)$, improving over previous private methods.
arXiv Detail & Related papers (2025-04-12T20:51:51Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z)
Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z)
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization [17.361143427007224]
temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning. In this paper, we consider preserving privacy in TD learning with a nonlinear value function. We show that DPTD could provide $epsilon,n-differential privacy (DP) guarantee for sensitive information encoded in transitions and retain the original power of TD learning.
arXiv Detail & Related papers (2022-01-25T16:48:29Z)
Differentially Private Exploration in Reinforcement Learning with Linear Representation [102.17246636801649]
We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration. We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a $widetildeO(sqrtK/epsilon)$ regret bound for $(epsilon,delta)
arXiv Detail & Related papers (2021-12-02T19:59:50Z)
Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes [78.27542864367821]
Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. We propose a novel $(varepsilon, delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs.
arXiv Detail & Related papers (2021-10-19T17:44:09Z)
Differential Privacy in Personalized Pricing with Nonparametric Demand Models [15.036147440342338]
This paper studies a dynamic personalized pricing problem with textitunknown nonparametric demand models under data privacy protection. Two concepts of data privacy, which have been widely applied in practices, are introduced. We develop two algorithms which make pricing decisions and learn the unknown demand on the fly, while satisfying the CDP and LDP gurantees respectively.
arXiv Detail & Related papers (2021-09-10T01:56:55Z)
Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution. We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z)
Local Differential Privacy for Regret Minimization in Reinforcement Learning [33.679678503441565]
We study privacy in the context of finite-horizon Markov Decision Processes (MDPs) We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework. We present an optimistic algorithm that simultaneously satisfies $varepsilon$-LDP requirements.
arXiv Detail & Related papers (2020-10-15T14:13:26Z)
Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL) We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP) We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)
Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution. We show that the constraint of local differential privacy incurs an exponential increase in cost. Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.