Lower Bounds for Learning in Revealing POMDPs
- URL: http://arxiv.org/abs/2302.01333v1
- Date: Thu, 2 Feb 2023 18:59:30 GMT
- Title: Lower Bounds for Learning in Revealing POMDPs
- Authors: Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai
- Abstract summary: This paper studies the fundamental limits of reinforcement learning (RL) in the challenging emphpartially observable setting.
For emphmulti-step revealing POMDPs, we show that the latent state-space dependence is at least $Omega(S1.5)$ in the sample complexity.
- Score: 88.23337313766355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the fundamental limits of reinforcement learning (RL) in
the challenging \emph{partially observable} setting. While it is
well-established that learning in Partially Observable Markov Decision
Processes (POMDPs) requires exponentially many samples in the worst case, a
surge of recent work shows that polynomial sample complexities are achievable
under the \emph{revealing condition} -- A natural condition that requires the
observables to reveal some information about the unobserved latent states.
However, the fundamental limits for learning in revealing POMDPs are much less
understood, with existing lower bounds being rather preliminary and having
substantial gaps from the current best upper bounds.
We establish strong PAC and regret lower bounds for learning in revealing
POMDPs. Our lower bounds scale polynomially in all relevant problem parameters
in a multiplicative fashion, and achieve significantly smaller gaps against the
current best upper bounds, providing a solid starting point for future studies.
In particular, for \emph{multi-step} revealing POMDPs, we show that (1) the
latent state-space dependence is at least $\Omega(S^{1.5})$ in the PAC sample
complexity, which is notably harder than the $\widetilde{\Theta}(S)$ scaling
for fully-observable MDPs; (2) Any polynomial sublinear regret is at least
$\Omega(T^{2/3})$, suggesting its fundamental difference from the
\emph{single-step} case where $\widetilde{O}(\sqrt{T})$ regret is achievable.
Technically, our hard instance construction adapts techniques in
\emph{distribution testing}, which is new to the RL literature and may be of
independent interest.
Related papers
- Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Expected Worst Case Regret via Stochastic Sequential Covering [14.834625066344582]
We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets.
For such minimax regrets we establish tight bounds via a novel concept of upper global sequential covering.
We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.
arXiv Detail & Related papers (2022-09-09T17:31:46Z) - Near-Optimal Sample Complexity Bounds for Constrained MDPs [25.509556551558834]
We provide minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP.
We show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.
arXiv Detail & Related papers (2022-06-13T15:58:14Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs [117.82903457289584]
We derive a novel problem-dependent lower-bound for regret in finite-horizon Markov Decision Processes (MDPs)
We show that our lower-bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all.
We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
arXiv Detail & Related papers (2021-06-24T13:46:09Z) - Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds
Revisited [46.682733513730334]
We propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs.
Our main contribution is a novel lower bound of $Omega((H3SA/epsilon2)log (1/delta))$ on the sample complexity of an $(varepsilon,delta)$-PAC algorithm for best policy identification in a non-stationary MDP.
arXiv Detail & Related papers (2020-10-07T17:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.