How Private Is Your RL Policy? An Inverse RL Based Analysis Framework
- URL: http://arxiv.org/abs/2112.05495v1
- Date: Fri, 10 Dec 2021 12:57:33 GMT
- Title: How Private Is Your RL Policy? An Inverse RL Based Analysis Framework
- Authors: Kritika Prakash, Fiza Husain, Praveen Paruchuri, Sujit P. Gujar
- Abstract summary: In domains like autonomous driving, recommendation systems, and more, optimal RL policies could cause a privacy breach if the policies memorize any part of the private reward.
We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization.
We propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy.
- Score: 5.987377024199901
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Reinforcement Learning (RL) enables agents to learn how to perform various
tasks from scratch. In domains like autonomous driving, recommendation systems,
and more, optimal RL policies learned could cause a privacy breach if the
policies memorize any part of the private reward. We study the set of existing
differentially-private RL policies derived from various RL algorithms such as
Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization. We
propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs
reward reconstruction as an adversarial attack on private policies that the
agents may deploy. For this, we introduce the reward reconstruction attack,
wherein we seek to reconstruct the original reward from a privacy-preserving
policy using an Inverse RL algorithm. An adversary must do poorly at
reconstructing the original reward function if the agent uses a tightly private
policy. Using this framework, we empirically test the effectiveness of the
privacy guarantee offered by the private algorithms on multiple instances of
the FrozenLake domain of varying complexities. Based on the analysis performed,
we infer a gap between the current standard of privacy offered and the standard
of privacy needed to protect reward functions in RL. We do so by quantifying
the extent to which each private policy protects the reward function by
measuring distances between the original and reconstructed rewards.
Related papers
- Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone [72.17534881026995]
We develop an offline and online fine-tuning approach called policy-agnostic RL (PA-RL)
We show the first result that successfully fine-tunes OpenVLA, a 7B generalist robot policy, autonomously with Cal-QL, an online RL fine-tuning algorithm.
arXiv Detail & Related papers (2024-12-09T17:28:03Z) - Preserving Expert-Level Privacy in Offline Reinforcement Learning [35.486119057117996]
We propose a consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm.
We prove rigorous differential privacy guarantees, while maintaining strong empirical performance.
arXiv Detail & Related papers (2024-11-18T21:26:53Z) - Differentially Private Reinforcement Learning with Self-Play [18.124829682487558]
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.
We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games.
We design a provably efficient algorithm based on optimistic Nash value and privatization of Bernstein-type bonuses.
arXiv Detail & Related papers (2024-04-11T08:42:51Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z) - Reinforcement Learning [36.664136621546575]
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains.
In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy.
arXiv Detail & Related papers (2020-05-29T06:53:29Z) - BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy.
We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.
We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z) - Preventing Imitation Learning with Adversarial Policy Ensembles [79.81807680370677]
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy.
How can we protect against external observers cloning our proprietary policies?
We introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies.
arXiv Detail & Related papers (2020-01-31T01:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.