Federated Linear Contextual Bandits with User-level Differential Privacy
- URL: http://arxiv.org/abs/2306.05275v2
- Date: Fri, 9 Jun 2023 11:32:04 GMT
- Title: Federated Linear Contextual Bandits with User-level Differential Privacy
- Authors: Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hajzinia,
Jing Yang
- Abstract summary: This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP)
We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting.
We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees.
- Score: 8.396384861175799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies federated linear contextual bandits under the notion of
user-level differential privacy (DP). We first introduce a unified federated
bandits framework that can accommodate various definitions of DP in the
sequential decision-making setting. We then formally introduce user-level
central DP (CDP) and local DP (LDP) in the federated bandits framework, and
investigate the fundamental trade-offs between the learning regrets and the
corresponding DP guarantees in a federated linear contextual bandits model. For
CDP, we propose a federated algorithm termed as $\texttt{ROBIN}$ and show that
it is near-optimal in terms of the number of clients $M$ and the privacy budget
$\varepsilon$ by deriving nearly-matching upper and lower regret bounds when
user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating
that learning under user-level $(\varepsilon,\delta)$-LDP must suffer a regret
blow-up factor at least $\min\{1/\varepsilon,M\}$ or
$\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$ under different conditions.
Related papers
- Differentially Private Best-Arm Identification [14.916947598339988]
Best Arm Identification (BAI) problems are progressively used for data-sensitive applications.
Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence in both the local and central models.
arXiv Detail & Related papers (2024-06-10T16:02:48Z) - Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - When Privacy Meets Partial Information: A Refined Analysis of
Differentially Private Bandits [4.964737844687583]
We study the problem of multi-armed bandits with $epsilon$-global Differential Privacy (DP)
We instantiate $epsilon$-global DP extensions of UCB and KL-UCB algorithms, namely AdaP-UCB and AdaP-KLUCB.
AdaP-KLUCB is the first algorithm that both satisfies $epsilon$-global DP and yields a regret upper bound that matches the problem-dependent lower bound up to multiplicative constants.
arXiv Detail & Related papers (2022-09-06T15:26:24Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Differentially Private Exploration in Reinforcement Learning with Linear
Representation [102.17246636801649]
We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.
We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a $widetildeO(sqrtK/epsilon)$ regret bound for $(epsilon,delta)
arXiv Detail & Related papers (2021-12-02T19:59:50Z) - Locally Differentially Private (Contextual) Bandits Learning [55.63825598391525]
We study locally differentially private (LDP) bandits learning in this paper.
We propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee.
arXiv Detail & Related papers (2020-06-01T04:02:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.