Replicability in Reinforcement Learning
- URL: http://arxiv.org/abs/2305.19562v2
- Date: Fri, 27 Oct 2023 23:53:10 GMT
- Title: Replicability in Reinforcement Learning
- Authors: Amin Karbasi, Grigoris Velegkas, Lin F. Yang, Felix Zhou
- Abstract summary: We focus on the fundamental setting of discounted MDPs with access to a generative model.
Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions.
- Score: 46.89386344741442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate the mathematical study of replicability as an algorithmic
property in the context of reinforcement learning (RL). We focus on the
fundamental setting of discounted tabular MDPs with access to a generative
model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is
replicable if, with high probability, it outputs the exact same policy after
two executions on i.i.d. samples drawn from the generator when its internal
randomness is the same. We first provide an efficient $\rho$-replicable
algorithm for $(\varepsilon, \delta)$-optimal policy estimation with sample and
time complexity $\widetilde
O\left(\frac{N^3\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$,
where $N$ is the number of state-action pairs. Next, for the subclass of
deterministic algorithms, we provide a lower bound of order
$\Omega\left(\frac{N^3}{(1-\gamma)^3\cdot\varepsilon^2\cdot\rho^2}\right)$.
Then, we study a relaxed version of replicability proposed by Kalavasis et al.
[2023] called TV indistinguishability. We design a computationally efficient TV
indistinguishable algorithm for policy estimation whose sample complexity is
$\widetilde
O\left(\frac{N^2\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$.
At the cost of $\exp(N)$ running time, we transform these TV indistinguishable
algorithms to $\rho$-replicable ones without increasing their sample
complexity. Finally, we introduce the notion of approximate-replicability where
we only require that two outputted policies are close under an appropriate
statistical divergence (e.g., Renyi) and show an improved sample complexity of
$\widetilde
O\left(\frac{N\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$.
Related papers
- Replicable Uniformity Testing [1.5883812630616523]
This work revisits uniformity testing under the framework of algorithmic replicability.
We obtain a replicable tester using only $tildeO(sqrtn varepsilon-2 rho-1)$ samples.
arXiv Detail & Related papers (2024-10-12T02:55:17Z) - Improved Sample Complexity for Private Nonsmooth Nonconvex Optimization [28.497079108813924]
We study differentially private (DP) optimization algorithms for and empirical objectives which are neither smooth nor convex.
We provide a single-pass $(alpha,beta)$-DP algorithm that returns an $widetildeOmegaleft (1/alphabeta3+d/epsilonalphabeta2+d3/4/epsilonalpha1/2beta3/2right)$.
We then provide a multi-pass time algorithm which further improves the sample complexity to $widetildeOmegaleft(
arXiv Detail & Related papers (2024-10-08T10:15:49Z) - Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
We derive approximation bounds for learning single neuron models using thresholded descent.
We also study the linear regression problem, where $sigma(mathbfx) = mathbfx$.
arXiv Detail & Related papers (2024-09-05T16:59:56Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Sharper Model-free Reinforcement Learning for Average-reward Markov
Decision Processes [21.77276136591518]
We develop provably efficient model-free reinforcement learning (RL) algorithms for Markov Decision Processes (MDPs)
In the simulator setting, we propose a model-free RL algorithm that finds an $epsilon$-optimal policy using $widetildeO left(fracSAmathrmsp(h*)epsilon2+fracS2Amathrmsp(h*)epsilon2right)$ samples.
arXiv Detail & Related papers (2023-06-28T17:43:19Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Reward-Mixing MDPs with a Few Latent Contexts are Learnable [75.17357040707347]
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs)
Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model.
arXiv Detail & Related papers (2022-10-05T22:52:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.