Differentially Private Model-Based Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2402.05525v1
- Date: Thu, 8 Feb 2024 10:05:11 GMT
- Title: Differentially Private Model-Based Offline Reinforcement Learning
- Authors: Alexandre Rio, Merwan Barlier, Igor Colin, Albert Thomas
- Abstract summary: We introduce DP-MORL, an algorithm coming with differential privacy guarantees.
A private model of the environment is first learned from offline data.
We then use model-based policy optimization to derive a policy from the private model.
- Score: 51.1231068185106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address offline reinforcement learning with privacy guarantees, where the
goal is to train a policy that is differentially private with respect to
individual trajectories in the dataset. To achieve this, we introduce DP-MORL,
an MBRL algorithm coming with differential privacy guarantees. A private model
of the environment is first learned from offline data using DP-FedAvg, a
training method for neural networks that provides differential privacy
guarantees at the trajectory level. Then, we use model-based policy
optimization to derive a policy from the (penalized) private model, without any
further interaction with the system or access to the input data. We empirically
show that DP-MORL enables the training of private RL agents from offline data
and we furthermore outline the price of privacy in this setting.
Related papers
- CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning [25.071018803326254]
Distribution shift is a major obstacle in offline reinforcement learning.
Previous conservative offline RL algorithms struggle to generalize to unseen actions.
We propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions.
arXiv Detail & Related papers (2024-06-11T17:59:29Z) - Considerations on the Theory of Training Models with Differential
Privacy [13.782477759025344]
In federated learning collaborative learning takes place by a set of clients who each want to remain in control of how their local training data is used.
Differential privacy is one method to limit privacy leakage.
arXiv Detail & Related papers (2023-03-08T15:56:27Z) - Strategic Decision-Making in the Presence of Information Asymmetry:
Provably Efficient RL with Algorithmic Instruments [55.41685740015095]
We study offline reinforcement learning under a novel model called strategic MDP.
We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN)
arXiv Detail & Related papers (2022-08-23T15:32:44Z) - Regularizing a Model-based Policy Stationary Distribution to Stabilize
Offline Reinforcement Learning [62.19209005400561]
offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets.
A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy.
We regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process.
arXiv Detail & Related papers (2022-06-14T20:56:16Z) - Offline Reinforcement Learning with Differential Privacy [16.871660060209674]
offline reinforcement learning problem is often motivated by the need to learn data-driven decision policies in financial, legal and healthcare applications.
We design offline RL algorithms with differential privacy guarantees which provably prevent such risks.
arXiv Detail & Related papers (2022-06-02T00:45:04Z) - Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization.
Regularization methods reduce the divergence between the learned policy and the behavior policy.
This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z) - Differentially Private Reinforcement Learning with Linear Function
Approximation [3.42658286826597]
We study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP)
Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers.
arXiv Detail & Related papers (2022-01-18T15:25:24Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.