Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient
- URL: http://arxiv.org/abs/2011.04019v1
- Date: Sun, 8 Nov 2020 16:48:02 GMT
- Title: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient
- Authors: Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesv\'ari, Mengdi Wang
- Abstract summary: This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
- Score: 62.24615324523435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper provides a statistical analysis of high-dimensional batch
Reinforcement Learning (RL) using sparse linear function approximation. When
there is a large number of candidate features, our result sheds light on the
fact that sparsity-aware methods can make batch RL more sample efficient. We
first consider the off-policy policy evaluation problem. To evaluate a new
target policy, we analyze a Lasso fitted Q-evaluation method and establish a
finite-sample error bound that has no polynomial dependence on the ambient
dimension. To reduce the Lasso bias, we further propose a post model-selection
estimator that applies fitted Q-evaluation to the features selected via group
Lasso. Under an additional signal strength assumption, we derive a sharper
instance-dependent error bound that depends on a divergence function measuring
the distribution mismatch between the data distribution and occupancy measure
of the target policy. Further, we study the Lasso fitted Q-iteration for batch
policy optimization and establish a finite-sample error bound depending on the
ratio between the number of relevant features and restricted minimal eigenvalue
of the data's covariance. In the end, we complement the results with minimax
lower bounds for batch-data policy evaluation/optimization that nearly match
our upper bounds. The results suggest that having well-conditioned data is
crucial for sparse batch policy learning.
Related papers
- Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Debiasing In-Sample Policy Performance for Small-Data, Large-Scale
Optimization [4.554894288663752]
We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.
Unlike cross-validation, our approach avoids sacrificing data for a test set.
We prove our estimator performs well in the small-data, largescale regime.
arXiv Detail & Related papers (2021-07-26T19:00:51Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Batch Reinforcement Learning with a Nonparametric Off-Policy Policy
Gradient [34.16700176918835]
Off-policy Reinforcement Learning holds the promise of better data efficiency.
Current off-policy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates.
We propose a nonparametric Bellman equation, which can be solved in closed form.
arXiv Detail & Related papers (2020-10-27T13:40:06Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.