Provably Efficient Partially Observable Risk-Sensitive Reinforcement
Learning with Hindsight Observation
- URL: http://arxiv.org/abs/2402.18149v1
- Date: Wed, 28 Feb 2024 08:24:06 GMT
- Title: Provably Efficient Partially Observable Risk-Sensitive Reinforcement
Learning with Hindsight Observation
- Authors: Tonghe Zhang, Yu Chen, Longbo Huang
- Abstract summary: We introduce a novel formulation that integrates hindsight observations into a Partially Observable Decision Process (POMDP) framework.
We develop the first provably efficient RL algorithm tailored for this setting.
These techniques are of particular interest to the theoretical study of reinforcement learning.
- Score: 35.278669159850146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work pioneers regret analysis of risk-sensitive reinforcement learning
in partially observable environments with hindsight observation, addressing a
gap in theoretical exploration. We introduce a novel formulation that
integrates hindsight observations into a Partially Observable Markov Decision
Process (POMDP) framework, where the goal is to optimize accumulated reward
under the entropic risk measure. We develop the first provably efficient RL
algorithm tailored for this setting. We also prove by rigorous analysis that
our algorithm achieves polynomial regret
$\tilde{O}\left(\frac{e^{|{\gamma}|H}-1}{|{\gamma}|H}H^2\sqrt{KHS^2OA}\right)$,
which outperforms or matches existing upper bounds when the model degenerates
to risk-neutral or fully observable settings. We adopt the method of
change-of-measure and develop a novel analytical tool of beta vectors to
streamline mathematical derivations. These techniques are of particular
interest to the theoretical study of reinforcement learning.
Related papers
- Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement
Learning with General Function Approximation [26.277745106128197]
We propose an algorithm to tackle long planning horizon problems in reinforcement learning with general function approximation.
The derived regret bound is deemed emphsharp as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors.
The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs and (ii) fine-grained analyses.
arXiv Detail & Related papers (2023-12-07T17:35:34Z) - Nonparametric Linear Feature Learning in Regression Through Regularisation [0.0]
We propose a novel method for joint linear feature learning and non-parametric function estimation.
By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions.
We establish that the expected risk of our method converges to the minimal risk under minimal assumptions and with explicit rates.
arXiv Detail & Related papers (2023-07-24T12:52:55Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - It begins with a boundary: A geometric view on probabilistically robust learning [6.877576704011329]
We take a fresh and geometric view on one such method -- Probabilistically Robust Learning (PRL)
We prove existence of solutions to the original and modified problems using novel relaxation methods.
We also clarify, through a suitable $Gamma$-convergence analysis, the way in which the original and modified PRL models interpolate between risk minimization and adversarial training.
arXiv Detail & Related papers (2023-05-30T06:24:30Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Exponential Bellman Equation and Improved Regret Bounds for
Risk-Sensitive Reinforcement Learning [106.20712175398275]
We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure.
We identify the deficiencies in existing algorithms and their analysis that result in such a gap.
We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.
arXiv Detail & Related papers (2021-11-06T19:35:18Z) - The Eigenlearning Framework: A Conservation Law Perspective on Kernel
Regression and Wide Neural Networks [1.6519302768772166]
We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression.
We identify a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.
arXiv Detail & Related papers (2021-10-08T06:32:07Z) - Surveillance Evasion Through Bayesian Reinforcement Learning [78.79938727251594]
We consider a 2D continuous path planning problem with a completely unknown intensity of random termination.
Those Observers' surveillance intensity is a priori unknown and has to be learned through repetitive path planning.
arXiv Detail & Related papers (2021-09-30T02:29:21Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.