Multi-Armed Bandits with Local Differential Privacy
- URL: http://arxiv.org/abs/2007.03121v1
- Date: Mon, 6 Jul 2020 23:36:20 GMT
- Title: Multi-Armed Bandits with Local Differential Privacy
- Authors: Wenbo Ren, Xingyu Zhou, Jia Liu, Ness B. Shroff
- Abstract summary: In bandit systems, the rewards may refer to the users' activities, which may involve private information.
We study the regret upper and lower bounds for MAB algorithms with a given LDP guarantee.
We prove a lower bound and propose algorithms whose regret upper bounds match the lower bound up to constant factors.
- Score: 32.538737981996405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the problem of regret minimization for multi-armed
bandit (MAB) problems with local differential privacy (LDP) guarantee. In
stochastic bandit systems, the rewards may refer to the users' activities,
which may involve private information and the users may not want the agent to
know. However, in many cases, the agent needs to know these activities to
provide better services such as recommendations and news feeds. To handle this
dilemma, we adopt differential privacy and study the regret upper and lower
bounds for MAB algorithms with a given LDP guarantee. In this paper, we prove a
lower bound and propose algorithms whose regret upper bounds match the lower
bound up to constant factors. Numerical experiments also confirm our
conclusions.
Related papers
- Differentially Private Reinforcement Learning with Self-Play [18.124829682487558]
We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.
We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games.
We design a provably efficient algorithm based on optimistic Nash value and privatization of Bernstein-type bonuses.
arXiv Detail & Related papers (2024-04-11T08:42:51Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Concentrated Differential Privacy for Bandits [11.086440815804227]
This paper contributes to the understanding of Differential Privacy (DP) in bandits with a trusted centralised decision-maker.
We propose three private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit settings.
arXiv Detail & Related papers (2023-09-01T16:08:00Z) - On Differentially Private Federated Linear Contextual Bandits [9.51828574518325]
We consider cross-silo federated linear contextual bandit (LCB) problem under differential privacy.
We identify three issues in the state-of-the-art: (i) failure of claimed privacy protection and (ii) incorrect regret bound due to noise miscalculation.
We show that our algorithm can achieve nearly optimal'' regret without a trusted server.
arXiv Detail & Related papers (2023-02-27T16:47:49Z) - Privacy Amplification via Shuffling for Linear Contextual Bandits [51.94904361874446]
We study the contextual linear bandit problem with differential privacy (DP)
We show that it is possible to achieve a privacy/utility trade-off between JDP and LDP by leveraging the shuffle model of privacy.
Our result shows that it is possible to obtain a tradeoff between JDP and LDP by leveraging the shuffle model while preserving local privacy.
arXiv Detail & Related papers (2021-12-11T15:23:28Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Local Differential Privacy for Regret Minimization in Reinforcement
Learning [33.679678503441565]
We study privacy in the context of finite-horizon Markov Decision Processes (MDPs)
We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework.
We present an optimistic algorithm that simultaneously satisfies $varepsilon$-LDP requirements.
arXiv Detail & Related papers (2020-10-15T14:13:26Z) - Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint.
We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret.
We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.