Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
- URL: http://arxiv.org/abs/2102.07929v3
- Date: Thu, 30 May 2024 07:17:20 GMT
- Title: Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
- Authors: Bingshan Hu, Zhiming Huang, Nishant A. Mehta, Nidhi Hegde,
- Abstract summary: We study differentially private online learning problems in a environment under both bandit and full information feedback.
For differentially private bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O left(sum_j: Delta_j>0 fracln(T)min leftDelta_j, epsilon right right)$ minimax lower bound.
For the same differentially private full information setting, we also present an $epsilon$-differentially
- Score: 7.4288915613206505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: \Delta_j>0} \frac{\ln(T)}{\min \left\{\Delta_j, \epsilon \right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $\Delta_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $\epsilon$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $\Omega \left(\frac{\ln(K)}{\min \left\{\Delta_{\min}, \epsilon \right\}} \right)$ instance-dependent regret lower bound and an $\Omega\left(\sqrt{T\ln(K)} + \frac{\ln(K)}{\epsilon}\right)$ minimax lower bound, where $K$ is the total number of actions and $\Delta_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $\epsilon$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor.
Related papers
- New Rates in Stochastic Decision-Theoretic Online Learning under Differential Privacy [17.711455925206298]
Hu and Mehta (2024) posed an open problem: what is the optimal instance-dependent rate for the decision-theoretic online learning (with $K$ actions and $T$ rounds) under $varepsilon$-differential privacy?
In this paper, we partially address this open problem by having two new results. First, we provide an improved upper bound for this problem $Oleft(fraclog KDelta_min + fraclog2Kvarepsilonright)$, where the $T$-dependency has been removed.
Second,
arXiv Detail & Related papers (2025-02-16T05:13:51Z) - Optimal Bounds for Private Minimum Spanning Trees via Input Perturbation [11.345141417475956]
We study the problem of privately releasing an approximate minimum spanning tree (MST)
Existing private MST algorithms face a trade-off, sacrificing either computational efficiency or accuracy.
We show that it is possible to get the best of both worlds: With a suitable random perturbation of the input that does not suffice to make the weight vector private, the result will be private and achieves a state-of-the-art error guarantee.
arXiv Detail & Related papers (2024-12-13T13:22:39Z) - DP-Dueling: Learning from Preference Feedback without Compromising User Privacy [32.58099924135157]
We give the first differentially private dueling bandit algorithm for active learning with user preferences.
Our algorithms are computationally efficient with near-optimal performance.
We extend our results to any general decision space in $d$-dimensions with potentially infinite arms.
arXiv Detail & Related papers (2024-03-22T09:02:12Z) - Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$.
We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes.
We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z) - Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model.
Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.