Towards Optimal Differentially Private Regret Bounds in Linear MDPs
- URL: http://arxiv.org/abs/2504.09339v2
- Date: Fri, 25 Apr 2025 18:40:53 GMT
- Title: Towards Optimal Differentially Private Regret Bounds in Linear MDPs
- Authors: Sharan Sahu,
- Abstract summary: We design a new differentially private algorithm by privatizing LSVI-UCB++ and adapting techniques for variance-aware analysis from offline RL.<n>Our algorithm achieves a regret bound of $widetildeO(d sqrtH3 K + H15/4 d7/6 K1/2 / epsilon)$, improving over previous private methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study regret minimization under privacy constraints in episodic inhomogeneous linear Markov Decision Processes (MDPs), motivated by the growing use of reinforcement learning (RL) in personalized decision-making systems that rely on sensitive user data. In this setting, both transition probabilities and reward functions are assumed to be linear in a feature mapping $\phi(s, a)$, and we aim to ensure privacy through joint differential privacy (JDP), a relaxation of differential privacy suited to online learning. Prior work has established suboptimal regret bounds by privatizing the LSVI-UCB algorithm, which achieves $\widetilde{O}(\sqrt{d^3 H^4 K})$ regret in the non-private setting. Building on recent advances that improve this to near minimax optimal regret $\widetilde{O}(d\sqrt{H^{3}K})$ via LSVI-UCB++ with Bernstein-style bonuses, we design a new differentially private algorithm by privatizing LSVI-UCB++ and adapting techniques for variance-aware analysis from offline RL. Our algorithm achieves a regret bound of $\widetilde{O}(d \sqrt{H^3 K} + H^{15/4} d^{7/6} K^{1/2} / \epsilon)$, improving over previous private methods. Empirical results show that our algorithm retains near-optimal utility compared to non-private baselines, indicating that privacy can be achieved with minimal performance degradation in this setting.
Related papers
- Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.
There is no differentially private distributed method for training, non-feedback problems.
We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z) - Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback [56.6950165117658]
We consider offline reinforcement learning with preference feedback in which the implicit reward is a linear function of an unknown parameter.
We propose an algorithm, underlineRL with underlineLocally underlineOptimal underlineWeights or sc RL-LOW, which yields a simple regret of $exp.
We observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of sc RL-LOW.
arXiv Detail & Related papers (2024-06-18T02:03:12Z) - FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits [8.908421753758475]
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems.
Motivated by data privacy concerns, we study the joint differentially private high dimensional sparse linear bandits.
We show that FLIPHAT achieves optimal regret in terms of privacy parameters.
arXiv Detail & Related papers (2024-05-22T22:19:12Z) - Achieving Constant Regret in Linear Markov Decision Processes [57.34287648914407]
We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs)<n>We show that Cert-LSVI-UCB has a cumulative regret of $tildemathcalO(d3H5/Delta)$ with high probability, provided that the misspecification level $zeta$ is below $tildemathcalO(Delta / (sqrtdH2))$.
arXiv Detail & Related papers (2024-04-16T17:23:19Z) - Near-Optimal Differentially Private Reinforcement Learning [16.871660060209674]
We study online exploration in reinforcement learning with differential privacy constraints.
Existing work established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP)
We design an $epsilon$-JDP algorithm with a regret of $widetildeO(sqrtSAH2T+S2AH3/epsilon)$ which matches the information-theoretic lower bound of non-private learning.
arXiv Detail & Related papers (2022-12-09T06:03:02Z) - Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints.
We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Improved Regret for Differentially Private Exploration in Linear MDP [31.567811502343552]
We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records.
We provide a private algorithm with an improved regret rate with an optimal dependence of $O(sqrtK)$ on the number of episodes.
arXiv Detail & Related papers (2022-02-02T21:32:09Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Output Perturbation for Differentially Private Convex Optimization with
Improved Population Loss Bounds, Runtimes and Applications to Private
Adversarial Training [12.386462516398469]
Finding efficient, easily implementable differentially private (DP) algorithms that offer strong excess risk bounds is an important problem in modern machine learning.
We provide the tightest known $(epsilon, 0)$-DP population loss bounds and fastest runtimes under the presence of smoothness and strong convexity.
We apply our theory to two learning frameworks: tilted ERM and adversarial learning frameworks.
arXiv Detail & Related papers (2021-02-09T08:47:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.