Related papers: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

URL: http://arxiv.org/abs/2106.07814v1
Date: Tue, 15 Jun 2021 00:06:59 GMT
Title: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
Authors: Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li
Abstract summary: We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
Score: 50.38337893712897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.

Related papers

Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions [10.225358400539719]
We present an efficient reinforcement algorithm for Decision (MDPs) where a linear-action-action generalizes in a feature map. Specifically, we introduce a new algorithm that efficiently finds a near-optimal policy in this setting.
arXiv Detail & Related papers (2024-09-07T14:38:05Z)
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time. We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z)
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice [79.48432795639403]
Mirror descent value iteration (MDVI) is an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL) We study MDVI with linear function approximation through its sample complexity required to identify an $varepsilon$-optimal policy. We present Variance-Weighted Least-Squares MDVI, the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs.
arXiv Detail & Related papers (2023-05-22T16:13:05Z)
Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators. We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z)
Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning. We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ. We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments. We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.