Differentially Private Episodic Reinforcement Learning with Heavy-tailed
Rewards
- URL: http://arxiv.org/abs/2306.01121v2
- Date: Mon, 5 Jun 2023 13:45:21 GMT
- Title: Differentially Private Episodic Reinforcement Learning with Heavy-tailed
Rewards
- Authors: Yulian Wu, Xingyu Zhou, Sayak Ray Chowdhury and Di Wang
- Abstract summary: We study the problem of Markov decision processes (MDPs) with heavy-tailed rewards under the constraint of differential privacy (DP)
By resorting to robust mean estimators for rewards, we first propose two frameworks for heavy-tailed MDPs.
We show that there are fundamental differences between the problem of private RL with sub-Gaussian and that with heavy-tailed rewards.
- Score: 12.809396600279479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of (finite horizon tabular) Markov
decision processes (MDPs) with heavy-tailed rewards under the constraint of
differential privacy (DP). Compared with the previous studies for private
reinforcement learning that typically assume rewards are sampled from some
bounded or sub-Gaussian distributions to ensure DP, we consider the setting
where reward distributions have only finite $(1+v)$-th moments with some $v \in
(0,1]$. By resorting to robust mean estimators for rewards, we first propose
two frameworks for heavy-tailed MDPs, i.e., one is for value iteration and
another is for policy optimization. Under each framework, we consider both
joint differential privacy (JDP) and local differential privacy (LDP) models.
Based on our frameworks, we provide regret upper bounds for both JDP and LDP
cases and show that the moment of distribution and privacy budget both have
significant impacts on regrets. Finally, we establish a lower bound of regret
minimization for heavy-tailed MDPs in JDP model by reducing it to the
instance-independent lower bound of heavy-tailed multi-armed bandits in DP
model. We also show the lower bound for the problem in LDP by adopting some
private minimax methods. Our results reveal that there are fundamental
differences between the problem of private RL with sub-Gaussian and that with
heavy-tailed rewards.
Related papers
- Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - Connect the Dots: Tighter Discrete Approximations of Privacy Loss
Distributions [49.726408540784334]
Key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support.
We show that our pessimistic estimate is the best possible among all pessimistic estimates.
arXiv Detail & Related papers (2022-07-10T04:25:02Z) - Differentially Private Regret Minimization in Episodic Markov Decision
Processes [6.396288020763144]
We study regret in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP)
This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems.
arXiv Detail & Related papers (2021-12-20T15:12:23Z) - A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs [117.82903457289584]
We derive a novel problem-dependent lower-bound for regret in finite-horizon Markov Decision Processes (MDPs)
We show that our lower-bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all.
We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
arXiv Detail & Related papers (2021-06-24T13:46:09Z) - Optimal Rates of (Locally) Differentially Private Heavy-tailed
Multi-Armed Bandits [11.419534203345187]
We study the problem of multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model.
We propose an algorithm which could be seen as locally private and robust version of the Successive Elimination (SE) algorithm.
arXiv Detail & Related papers (2021-06-04T16:17:21Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z) - Three Variants of Differential Privacy: Lossless Conversion and
Applications [13.057076084452016]
We consider three different variants of differential privacy (DP), namely approximate DP, R'enyi RDP, and hypothesis test.
In the first part, we develop a machinery for relating approximate DP to iterations based on the joint range of two $f$-divergences.
As an application, we apply our result to the moments framework for characterizing privacy guarantees of noisy gradient descent.
arXiv Detail & Related papers (2020-08-14T18:23:50Z) - Locally Differentially Private (Contextual) Bandits Learning [55.63825598391525]
We study locally differentially private (LDP) bandits learning in this paper.
We propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee.
arXiv Detail & Related papers (2020-06-01T04:02:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.