Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear
Value Functions
- URL: http://arxiv.org/abs/2107.05187v1
- Date: Mon, 12 Jul 2021 04:13:18 GMT
- Title: Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear
Value Functions
- Authors: Siddartha Devic, Zihao Deng, Brendan Juba
- Abstract summary: We present the first algorithm for reinforcement learning with Factored Markov Decision Processes (FMDPs)
We do not assume that the transitions on various factors are independent.
In contrast to prior work, we do not assume that the transitions on various factors are independent.
- Score: 25.621280373733605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many reinforcement learning (RL) environments in practice feature enormous
state spaces that may be described compactly by a "factored" structure, that
may be modeled by Factored Markov Decision Processes (FMDPs). We present the
first polynomial-time algorithm for RL with FMDPs that does not rely on an
oracle planner, and instead of requiring a linear transition model, only
requires a linear value function with a suitable local basis with respect to
the factorization. With this assumption, we can solve FMDPs in polynomial time
by constructing an efficient separation oracle for convex optimization.
Importantly, and in contrast to prior work, we do not assume that the
transitions on various factors are independent.
Related papers
- Horizon-Free Regret for Linear Markov Decision Processes [92.02082223856479]
A recent line of works showed regret bounds in reinforcement learning can be (nearly) independent of planning horizon.
We give the first horizon-free bound for the popular linear Markov Decision Process (MDP) setting.
In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions, we directly estimate the value functions and confidence sets.
arXiv Detail & Related papers (2024-03-15T23:50:58Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time.
We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL.
For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Computational-Statistical Gaps in Reinforcement Learning [23.517741855454044]
We show a reduction from Unique-Sat, where we convert a CNF formula into an MDP Hypothesis with deterministic transitions, constant number of actions and low dimensional linear optimal value functions.
This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation.
arXiv Detail & Related papers (2022-02-11T04:48:35Z) - Sample Efficient Reinforcement Learning In Continuous State Spaces: A
Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions.
We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition.
We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Local Function Complexity for Active Learning via Mixture of Gaussian
Processes [5.382740428160009]
Inhomogeneities in real-world data, due to changes in the observation noise level or variations in the structural complexity of the source function, pose a unique set of challenges for statistical inference.
In this paper, we draw on recent theoretical results on the estimation of local function complexity (LFC)
We derive and estimate the Gaussian process regression (GPR)-based analog of the LPS-based LFC and use it as a substitute in the above framework to make it robust and scalable.
arXiv Detail & Related papers (2019-02-27T17:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.