Privacy-Preserving Reinforcement Learning Beyond Expectation
- URL: http://arxiv.org/abs/2203.10165v1
- Date: Fri, 18 Mar 2022 21:28:29 GMT
- Title: Privacy-Preserving Reinforcement Learning Beyond Expectation
- Authors: Arezoo Rajabi, Bhaskar Ramasubramanian, Abdullah Al Maruf, Radha
Poovendran
- Abstract summary: Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans.
It is important to align system (or agent) behaviors with the preferences of one or more human users.
We consider the case when an agent has to learn behaviors in an unknown environment.
- Score: 6.495883501989546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cyber and cyber-physical systems equipped with machine learning algorithms
such as autonomous cars share environments with humans. In such a setting, it
is important to align system (or agent) behaviors with the preferences of one
or more human users. We consider the case when an agent has to learn behaviors
in an unknown environment. Our goal is to capture two defining characteristics
of humans: i) a tendency to assess and quantify risk, and ii) a desire to keep
decision making hidden from external parties. We incorporate cumulative
prospect theory (CPT) into the objective of a reinforcement learning (RL)
problem for the former. For the latter, we use differential privacy. We design
an algorithm to enable an RL agent to learn policies to maximize a CPT-based
objective in a privacy-preserving manner and establish guarantees on the
privacy of value functions learned by the algorithm when rewards are
sufficiently close. This is accomplished through adding a calibrated noise
using a Gaussian process mechanism at each step. Through empirical evaluations,
we highlight a privacy-utility tradeoff and demonstrate that the RL agent is
able to learn behaviors that are aligned with that of a human user in the same
environment in a privacy-preserving manner
Related papers
- PAPER-HILT: Personalized and Adaptive Privacy-Aware Early-Exit for
Reinforcement Learning in Human-in-the-Loop Systems [0.6282068591820944]
Reinforcement Learning (RL) has increasingly become a preferred method over traditional rule-based systems in diverse human-in-the-loop (HITL) applications.
This paper focuses on developing an innovative, adaptive RL strategy through exploiting an early-exit approach designed explicitly for privacy preservation in HITL environments.
arXiv Detail & Related papers (2024-03-09T10:24:12Z) - Group Decision-Making among Privacy-Aware Agents [2.4401219403555814]
Preserving individual privacy and enabling efficient social learning are both important desiderata but seem fundamentally at odds with each other.
We do so by controlling information leakage using rigorous statistical guarantees that are based on differential privacy (DP)
Our results flesh out the nature of the trade-offs in both cases between the quality of the group decision outcomes, learning accuracy, communication cost, and the level of privacy protections that the agents are afforded.
arXiv Detail & Related papers (2024-02-13T01:38:01Z) - Your Room is not Private: Gradient Inversion Attack on Reinforcement
Learning [47.96266341738642]
Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information.
This paper proposes an attack on the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervision signals.
arXiv Detail & Related papers (2023-06-15T16:53:26Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z) - adaPARL: Adaptive Privacy-Aware Reinforcement Learning for
Sequential-Decision Making Human-in-the-Loop Systems [0.5414308305392761]
Reinforcement learning (RL) presents numerous benefits compared to rule-based approaches in various applications.
We propose adaPARL, an adaptive approach for privacy-aware RL, especially for human-in-the-loop IoT systems.
AdaPARL provides a personalized privacy-utility trade-off depending on human behavior and preference.
arXiv Detail & Related papers (2023-03-07T21:55:22Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Reinforcement Learning Beyond Expectation [11.428014000851535]
Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently.
In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment.
In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
arXiv Detail & Related papers (2021-03-29T20:35:25Z) - Tempered Sigmoid Activations for Deep Learning with Differential Privacy [33.574715000662316]
We show that the choice of activation function is central to bounding the sensitivity of privacy-preserving deep learning.
We achieve new state-of-the-art accuracy on MNIST, FashionMNIST, and CIFAR10 without any modification of the learning procedure fundamentals.
arXiv Detail & Related papers (2020-07-28T13:19:45Z) - Cooperative Inverse Reinforcement Learning [64.60722062217417]
We propose a formal definition of the value alignment problem as cooperative reinforcement learning (CIRL)
A CIRL problem is a cooperative, partial-information game with two agents human and robot; both are rewarded according to the human's reward function, but the robot does not initially know what this is.
In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions.
arXiv Detail & Related papers (2016-06-09T22:39:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.