Related papers: Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

URL: http://arxiv.org/abs/2006.14389v1
Date: Wed, 24 Jun 2020 15:40:21 GMT
Title: Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Abstract summary: We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity. We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm. We propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound.
Score: 25.20231604057821
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm, and establish its dynamic regret bound when the variation budgets are known. In addition, we propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound, but in a parameter-free manner, i.e., without knowing the variation budgets. Notably, learning non-stationary MDPs via the conventional optimistic exploration technique presents a unique challenge absent in existing (non-stationary) bandit learning settings. We overcome the challenge by a novel confidence widening technique that incorporates additional optimism.

Related papers

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes [48.9399496805422]
We study the learning problem of robust offline non-Markovian RL. We introduce a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set. By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $epsilon$-optimal robust policy.
arXiv Detail & Related papers (2024-11-12T03:22:56Z)
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. We introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various learning objectives, or subtasks, simultaneously in the average-reward setting.
arXiv Detail & Related papers (2024-10-14T14:52:23Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design [9.554944575754638]
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs) We propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature.
arXiv Detail & Related papers (2022-11-19T22:40:09Z)
Opportunistic Episodic Reinforcement Learning [9.364712393700056]
opportunistic reinforcement learning is a new variant of reinforcement learning problems where the regret of selecting a suboptimal action varies under an external environmental condition known as the variation factor. Our intuition is to exploit more when the variation factor is high, and explore more when the variation factor is low. Our algorithms balance the exploration-exploitation trade-off for reinforcement learning by introducing variation factor-dependent optimism to guide exploration.
arXiv Detail & Related papers (2022-10-24T18:02:33Z)
Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty [5.117030416610515]
Online reinforcement learning (RL) has been widely applied in information processing scenarios. We consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty. We establish a regret bound of saving at most $sqrtS$ or $Sfrac16Tfrac112$ compared with the latest results in the literature.
arXiv Detail & Related papers (2021-06-15T22:57:45Z)
Low-Precision Reinforcement Learning [63.930246183244705]
Low-precision training has become a popular approach to reduce computation time, memory footprint, and energy consumption in supervised learning. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a na"ive adaptation of low-precision methods from supervised learning fails.
arXiv Detail & Related papers (2021-02-26T16:16:28Z)
Nonstationary Reinforcement Learning with Linear Function Approximation [19.521419943509784]
We consider reinforcement learning in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. We first develop an optimistic modification of least-squares value with periodic restart, and bound its dynamic regret when variation budgets are known. We derive the first minimax dynamic regret lower bound for nonstationary linear MDPs and as a byproduct establish a minimax regret lower bound for linear MDPs unsolved by Jin et al.
arXiv Detail & Related papers (2020-10-08T20:07:44Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.