How Private Is Your RL Policy? An Inverse RL Based Analysis Framework
- URL: http://arxiv.org/abs/2112.05495v1
- Date: Fri, 10 Dec 2021 12:57:33 GMT
- Title: How Private Is Your RL Policy? An Inverse RL Based Analysis Framework
- Authors: Kritika Prakash, Fiza Husain, Praveen Paruchuri, Sujit P. Gujar
- Abstract summary: In domains like autonomous driving, recommendation systems, and more, optimal RL policies could cause a privacy breach if the policies memorize any part of the private reward.
We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization.
We propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy.
- Score: 5.987377024199901
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Reinforcement Learning (RL) enables agents to learn how to perform various
tasks from scratch. In domains like autonomous driving, recommendation systems,
and more, optimal RL policies learned could cause a privacy breach if the
policies memorize any part of the private reward. We study the set of existing
differentially-private RL policies derived from various RL algorithms such as
Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization. We
propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs
reward reconstruction as an adversarial attack on private policies that the
agents may deploy. For this, we introduce the reward reconstruction attack,
wherein we seek to reconstruct the original reward from a privacy-preserving
policy using an Inverse RL algorithm. An adversary must do poorly at
reconstructing the original reward function if the agent uses a tightly private
policy. Using this framework, we empirically test the effectiveness of the
privacy guarantee offered by the private algorithms on multiple instances of
the FrozenLake domain of varying complexities. Based on the analysis performed,
we infer a gap between the current standard of privacy offered and the standard
of privacy needed to protect reward functions in RL. We do so by quantifying
the extent to which each private policy protects the reward function by
measuring distances between the original and reconstructed rewards.
Related papers
- Preserving Expert-Level Privacy in Offline Reinforcement Learning [35.486119057117996]
We propose a consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm.
We prove rigorous differential privacy guarantees, while maintaining strong empirical performance.
arXiv Detail & Related papers (2024-11-18T21:26:53Z) - Differentially Private Reinforcement Learning with Self-Play [18.124829682487558]
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.
We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games.
We design a provably efficient algorithm based on optimistic Nash value and privatization of Bernstein-type bonuses.
arXiv Detail & Related papers (2024-04-11T08:42:51Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Optimistic Linear Support and Successor Features as a Basis for Optimal
Policy Transfer [7.970144204429356]
We introduce an SF-based extension of the Optimistic Linear Support algorithm to learn a set of policies whose SFs form a convex coverage set.
We prove that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks.
arXiv Detail & Related papers (2022-06-22T19:00:08Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z) - Preference-based Reinforcement Learning with Finite-Time Guarantees [76.88632321436472]
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning to better elicit human opinion on the target objective.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
We present the first finite-time analysis for general PbRL problems.
arXiv Detail & Related papers (2020-06-16T03:52:41Z) - Reinforcement Learning [36.664136621546575]
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains.
In this chapter, we present the basic framework of RL and recall the two main families of approaches that have been developed to learn a good policy.
arXiv Detail & Related papers (2020-05-29T06:53:29Z) - BRPO: Batch Residual Policy Optimization [79.53696635382592]
In batch reinforcement learning, one often constrains a learned policy to be close to the behavior (data-generating) policy.
We propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.
We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.
arXiv Detail & Related papers (2020-02-08T01:59:33Z) - Preventing Imitation Learning with Adversarial Policy Ensembles [79.81807680370677]
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy.
How can we protect against external observers cloning our proprietary policies?
We introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies.
arXiv Detail & Related papers (2020-01-31T01:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.