Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement
Learning?
- URL: http://arxiv.org/abs/2010.05673v2
- Date: Sat, 17 Oct 2020 08:58:54 GMT
- Title: Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement
Learning?
- Authors: Qiwen Cui and Lin F. Yang
- Abstract summary: This work considers sample complexity of finding an $epsilon$-optimal policy in a Markov decision process (MDP)
We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver.
We show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for reinforcement learning.
- Score: 30.065091907118827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is believed that a model-based approach for reinforcement learning (RL) is
the key to reduce sample complexity. However, the understanding of the sample
optimality of model-based RL is still largely missing, even for the linear
case. This work considers sample complexity of finding an $\epsilon$-optimal
policy in a Markov decision process (MDP) that admits a linear additive feature
representation, given only access to a generative model. We solve this problem
via a plug-in solver approach, which builds an empirical model and plans in
this empirical model via an arbitrary plug-in solver. We prove that under the
anchor-state assumption, which implies implicit non-negativity in the feature
space, the minimax sample complexity of finding an $\epsilon$-optimal policy in
a $\gamma$-discounted MDP is $O(K/(1-\gamma)^3\epsilon^2)$, which only depends
on the dimensionality $K$ of the feature space and has no dependence on the
state or action space. We further extend our results to a relaxed setting where
anchor-states may not exist and show that a plug-in approach can be sample
efficient as well, providing a flexible approach to design model-based
algorithms for RL.
Related papers
- The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
We study the sample complexity of the plug-in approach for learning $varepsilon$-optimal policies in average-reward Markov decision processes.
Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed.
arXiv Detail & Related papers (2024-10-10T05:08:14Z) - KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model.
Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with
Plug-in Solver [32.212146650873194]
We provide approaches to learn an RL model efficiently without the guidance of a reward signal.
In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase.
We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking $tildeO(d2H3/epsilon2)$ interactions with the environment.
arXiv Detail & Related papers (2021-10-07T07:59:50Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.