Differentially Private Regret Minimization in Episodic Markov Decision
Processes
- URL: http://arxiv.org/abs/2112.10599v1
- Date: Mon, 20 Dec 2021 15:12:23 GMT
- Title: Differentially Private Regret Minimization in Episodic Markov Decision
Processes
- Authors: Sayak Ray Chowdhury, Xingyu Zhou
- Abstract summary: We study regret in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP)
This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems.
- Score: 6.396288020763144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study regret minimization in finite horizon tabular Markov decision
processes (MDPs) under the constraints of differential privacy (DP). This is
motivated by the widespread applications of reinforcement learning (RL) in
real-world sequential decision making problems, where protecting users'
sensitive and private information is becoming paramount. We consider two
variants of DP -- joint DP (JDP), where a centralized agent is responsible for
protecting users' sensitive data and local DP (LDP), where information needs to
be protected directly on the user side. We first propose two general frameworks
-- one for policy optimization and another for value iteration -- for designing
private, optimistic RL algorithms. We then instantiate these frameworks with
suitable privacy mechanisms to satisfy JDP and LDP requirements, and
simultaneously obtain sublinear regret guarantees. The regret bounds show that
under JDP, the cost of privacy is only a lower order additive term, while for a
stronger privacy protection under LDP, the cost suffered is multiplicative.
Finally, the regret bounds are obtained by a unified analysis, which, we
believe, can be extended beyond tabular MDPs.
Related papers
- Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy [55.357715095623554]
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties.
We propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification.
arXiv Detail & Related papers (2024-10-24T03:39:55Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Differentially Private Reinforcement Learning with Self-Play [18.124829682487558]
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.
We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games.
We design a provably efficient algorithm based on optimistic Nash value and privatization of Bernstein-type bonuses.
arXiv Detail & Related papers (2024-04-11T08:42:51Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - Differentially Private Reinforcement Learning with Linear Function
Approximation [3.42658286826597]
We study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP)
Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers.
arXiv Detail & Related papers (2022-01-18T15:25:24Z) - Privacy Amplification via Shuffling for Linear Contextual Bandits [51.94904361874446]
We study the contextual linear bandit problem with differential privacy (DP)
We show that it is possible to achieve a privacy/utility trade-off between JDP and LDP by leveraging the shuffle model of privacy.
Our result shows that it is possible to obtain a tradeoff between JDP and LDP by leveraging the shuffle model while preserving local privacy.
arXiv Detail & Related papers (2021-12-11T15:23:28Z) - Local Differential Privacy for Regret Minimization in Reinforcement
Learning [33.679678503441565]
We study privacy in the context of finite-horizon Markov Decision Processes (MDPs)
We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework.
We present an optimistic algorithm that simultaneously satisfies $varepsilon$-LDP requirements.
arXiv Detail & Related papers (2020-10-15T14:13:26Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.