Locally Differentially Private Reinforcement Learning for Linear Mixture
Markov Decision Processes
- URL: http://arxiv.org/abs/2110.10133v1
- Date: Tue, 19 Oct 2021 17:44:09 GMT
- Title: Locally Differentially Private Reinforcement Learning for Linear Mixture
Markov Decision Processes
- Authors: Chonghua Liao and Jiafan He and Quanquan Gu
- Abstract summary: Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data.
To protect the users' privacy, privacy-preserving RL algorithms are in demand.
We propose a novel $(varepsilon, delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs.
- Score: 78.27542864367821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) algorithms can be used to provide personalized
services, which rely on users' private and sensitive data. To protect the
users' privacy, privacy-preserving RL algorithms are in demand. In this paper,
we study RL with linear function approximation and local differential privacy
(LDP) guarantees. We propose a novel $(\varepsilon, \delta)$-LDP algorithm for
learning a class of Markov decision processes (MDPs) dubbed linear mixture
MDPs, and obtains an $\tilde{\mathcal{O}}(
d^{5/4}H^{7/4}T^{3/4}\left(\log(1/\delta)\right)^{1/4}\sqrt{1/\varepsilon})$
regret, where $d$ is the dimension of feature mapping, $H$ is the length of the
planning horizon, and $T$ is the number of interactions with the environment.
We also prove a lower bound
$\Omega(dH\sqrt{T}/\left(e^{\varepsilon}(e^{\varepsilon}-1)\right))$ for
learning linear mixture MDPs under $\varepsilon$-LDP constraint. Experiments on
synthetic datasets verify the effectiveness of our algorithm. To the best of
our knowledge, this is the first provable privacy-preserving RL algorithm with
linear function approximation.
Related papers
- A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs [18.449996575976993]
We propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting.
Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(epsilon-2)$ with partial data coverage assumption.
We extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.
arXiv Detail & Related papers (2024-02-07T00:33:11Z) - Scaling Up Differentially Private LASSO Regularized Logistic Regression
via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively.
Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z) - Sharper Model-free Reinforcement Learning for Average-reward Markov
Decision Processes [21.77276136591518]
We develop provably efficient model-free reinforcement learning (RL) algorithms for Markov Decision Processes (MDPs)
In the simulator setting, we propose a model-free RL algorithm that finds an $epsilon$-optimal policy using $widetildeO left(fracSAmathrmsp(h*)epsilon2+fracS2Amathrmsp(h*)epsilon2right)$ samples.
arXiv Detail & Related papers (2023-06-28T17:43:19Z) - Near-Optimal Differentially Private Reinforcement Learning [16.871660060209674]
We study online exploration in reinforcement learning with differential privacy constraints.
Existing work established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP)
We design an $epsilon$-JDP algorithm with a regret of $widetildeO(sqrtSAH2T+S2AH3/epsilon)$ which matches the information-theoretic lower bound of non-private learning.
arXiv Detail & Related papers (2022-12-09T06:03:02Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - Differentially Private Exploration in Reinforcement Learning with Linear
Representation [102.17246636801649]
We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.
We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a $widetildeO(sqrtK/epsilon)$ regret bound for $(epsilon,delta)
arXiv Detail & Related papers (2021-12-02T19:59:50Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Logarithmic Regret for Reinforcement Learning with Linear Function
Approximation [99.59319332864129]
We show that logarithmic regret is attainable under two recently proposed linear MDP assumptions.
To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
arXiv Detail & Related papers (2020-11-23T17:25:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.