Fundamental Limits of Reinforcement Learning in Environment with
Endogeneous and Exogeneous Uncertainty
- URL: http://arxiv.org/abs/2106.08477v1
- Date: Tue, 15 Jun 2021 22:57:45 GMT
- Title: Fundamental Limits of Reinforcement Learning in Environment with
Endogeneous and Exogeneous Uncertainty
- Authors: Rongpeng Li
- Abstract summary: Online reinforcement learning (RL) has been widely applied in information processing scenarios.
We consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty.
We establish a regret bound of saving at most $sqrtS$ or $Sfrac16Tfrac112$ compared with the latest results in the literature.
- Score: 5.117030416610515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online reinforcement learning (RL) has been widely applied in information
processing scenarios, which usually exhibit much uncertainty due to the
intrinsic randomness of channels and service demands. In this paper, we
consider an un-discounted RL in general Markov decision processes (MDPs) with
both endogeneous and exogeneous uncertainty, where both the rewards and state
transition probability are unknown to the RL agent and evolve with the time as
long as their respective variations do not exceed certain dynamic budget (i.e.,
upper bound). We first develop a variation-aware Bernstein-based upper
confidence reinforcement learning (VB-UCRL), which we allow to restart
according to a schedule dependent on the variations. We successfully overcome
the challenges due to the exogeneous uncertainty and establish a regret bound
of saving at most $\sqrt{S}$ or $S^{\frac{1}{6}}T^{\frac{1}{12}}$ compared with
the latest results in the literature, where $S$ denotes the state size of the
MDP and $T$ indicates the iteration index of learning steps.
Related papers
- Robust Offline Reinforcement Learning for Non-Markovian Decision Processes [48.9399496805422]
We study the learning problem of robust offline non-Markovian RL.
We introduce a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set.
By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $epsilon$-optimal robust policy.
arXiv Detail & Related papers (2024-11-12T03:22:56Z) - Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Settling Constant Regrets in Linear Markov Decision Processes [57.34287648914407]
We study the constant regret guarantees in reinforcement learning (RL)
We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs)
For an MDP characterized by a minimal suboptimality gap $Delta$, Cert-LSVI-UCB has a cumulative regret of $tildemathcalO(d3H5/Delta)$ with high probability, provided that the misspecification level $zeta$ is below $tildemathcalO(Delta / (sqrtd
arXiv Detail & Related papers (2024-04-16T17:23:19Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal
Dynamic Regret, Adaptive Detection, and Separation Design [9.554944575754638]
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs)
We propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets.
This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature.
arXiv Detail & Related papers (2022-11-19T22:40:09Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Nonstationary Reinforcement Learning with Linear Function Approximation [19.521419943509784]
We consider reinforcement learning in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment.
We first develop an optimistic modification of least-squares value with periodic restart, and bound its dynamic regret when variation budgets are known.
We derive the first minimax dynamic regret lower bound for nonstationary linear MDPs and as a byproduct establish a minimax regret lower bound for linear MDPs unsolved by Jin et al.
arXiv Detail & Related papers (2020-10-08T20:07:44Z) - Reinforcement Learning for Non-Stationary Markov Decision Processes: The
Blessing of (More) Optimism [25.20231604057821]
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity.
We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm.
We propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound.
arXiv Detail & Related papers (2020-06-24T15:40:21Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.