Faster Rates for Private Adversarial Bandits
- URL: http://arxiv.org/abs/2505.21790v1
- Date: Tue, 27 May 2025 21:42:10 GMT
- Title: Faster Rates for Private Adversarial Bandits
- Authors: Hilal Asi, Vinod Raman, Kunal Talwar,
- Abstract summary: We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice.<n>For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithm to a private bandit algorithm.<n>For bandits with expert advice, we give the first differentially private algorithms, with expected regret $Oleft(fracsqrtNTsqrtepsilonright), Oleft(fracsqrtKTlog(N)log(KT)epsilonright)$, and $tildeOleft(
- Score: 31.74205580746007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithm to a private bandit algorithm. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of $O\left(\frac{\sqrt{KT}}{\sqrt{\epsilon}}\right)$, improving upon the existing upper bound $O\left(\frac{\sqrt{KT \log(KT)}}{\epsilon}\right)$ for all $\epsilon \leq 1$. In particular, our algorithms allow for sublinear expected regret even when $\epsilon \leq \frac{1}{\sqrt{T}}$, establishing the first known separation between central and local differential privacy for this problem. For bandits with expert advice, we give the first differentially private algorithms, with expected regret $O\left(\frac{\sqrt{NT}}{\sqrt{\epsilon}}\right), O\left(\frac{\sqrt{KT\log(N)}\log(KT)}{\epsilon}\right)$, and $\tilde{O}\left(\frac{N^{1/6}K^{1/2}T^{2/3}\log(NT)}{\epsilon ^{1/3}} + \frac{N^{1/2}\log(NT)}{\epsilon}\right)$, where $K$ and $N$ are the number of actions and experts respectively. These rates allow us to get sublinear regret for different combinations of small and large $K, N$ and $\epsilon.$
Related papers
- Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards [38.53963338592248]
We improve both upper and lower bounds on the minimax regret compared to prior work.<n>We propose a new elimination-based algorithm guided by experimental design.<n>We show that for some geometries, such as $l_p$-norm balls for $p le 1 + epsilon$, we can further reduce the dependence on $d$.
arXiv Detail & Related papers (2025-06-05T09:07:26Z) - Improved Regret in Stochastic Decision-Theoretic Online Learning under Differential Privacy [17.711455925206298]
Hu and Mehta (2024) posed an open problem: what is the optimal instance-dependent rate for the decision-theoretic online learning (with $K$ actions and $T$ rounds) under $varepsilon$-differential privacy?
arXiv Detail & Related papers (2025-02-16T05:13:51Z) - Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Differentially Private Stochastic Linear Bandits: (Almost) for Free [17.711701190680742]
In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free.
In the shuffled model, we also achieve regret of $tildeO(sqrtT+frac1epsilon)$ %for small $epsilon$ as in the central case, while the best previously known algorithm suffers a regret of $tildeO(frac1epsilonT3/5)$.
arXiv Detail & Related papers (2022-07-07T17:20:57Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model.
Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z) - Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z) - Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment [7.4288915613206505]
We study differentially private online learning problems in a environment under both bandit and full information feedback.
For differentially private bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O left(sum_j: Delta_j>0 fracln(T)min leftDelta_j, epsilon right right)$ minimax lower bound.
For the same differentially private full information setting, we also present an $epsilon$-differentially
arXiv Detail & Related papers (2021-02-16T02:48:16Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.