Preserving Expert-Level Privacy in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2411.13598v1
- Date: Mon, 18 Nov 2024 21:26:53 GMT
- Title: Preserving Expert-Level Privacy in Offline Reinforcement Learning
- Authors: Navodita Sharma, Vishnu Vinod, Abhradeep Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer,
- Abstract summary: We propose a consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm.
We prove rigorous differential privacy guarantees, while maintaining strong empirical performance.
- Score: 35.486119057117996
- License:
- Abstract: The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment. However, the individual experts may be privacy-sensitive in that the learnt policy may retain information about their precise choices. In some domains like personalized retrieval, advertising and healthcare, the expert choices are considered sensitive data. To provably protect the privacy of such experts, we propose a novel consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm. We prove rigorous differential privacy guarantees, while maintaining strong empirical performance. Unlike existing work in differentially private RL, we supplement the theory with proof-of-concept experiments on classic RL environments featuring large continuous state spaces, demonstrating substantial improvements over a natural baseline across multiple tasks.
Related papers
- Centering Policy and Practice: Research Gaps around Usable Differential Privacy [12.340264479496375]
We argue that while differential privacy is a clean formulation in theory, it poses significant challenges in practice.
To bridge the gaps between differential privacy's promises and its real-world usability, researchers and practitioners must work together.
arXiv Detail & Related papers (2024-06-17T21:32:30Z) - Differentially Private Deep Model-Based Reinforcement Learning [47.651861502104715]
We introduce PriMORL, a model-based RL algorithm with formal differential privacy guarantees.
PriMORL learns an ensemble of trajectory-level DP models of the environment from offline data.
arXiv Detail & Related papers (2024-02-08T10:05:11Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Offline Reinforcement Learning with Differential Privacy [16.871660060209674]
offline reinforcement learning problem is often motivated by the need to learn data-driven decision policies in financial, legal and healthcare applications.
We design offline RL algorithms with differential privacy guarantees which provably prevent such risks.
arXiv Detail & Related papers (2022-06-02T00:45:04Z) - Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z) - When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning? [86.43517734716606]
offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction.
behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning.
We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
arXiv Detail & Related papers (2022-04-12T08:25:34Z) - How Private Is Your RL Policy? An Inverse RL Based Analysis Framework [5.987377024199901]
In domains like autonomous driving, recommendation systems, and more, optimal RL policies could cause a privacy breach if the policies memorize any part of the private reward.
We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization.
We propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy.
arXiv Detail & Related papers (2021-12-10T12:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.