Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
- URL: http://arxiv.org/abs/2103.13883v1
- Date: Thu, 25 Mar 2021 14:45:29 GMT
- Title: Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
- Authors: Yaqi Duan, Chi Jin, Zhiyuan Li
- Abstract summary: This paper considers batch Reinforcement Learning (RL) with general value function approximation.
The excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class.
Fast statistical rates can be achieved by using tools of local Rademacher complexity.
- Score: 36.015585972493575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers batch Reinforcement Learning (RL) with general value
function approximation. Our study investigates the minimal assumptions to
reliably estimate/minimize Bellman error, and characterizes the generalization
performance by (local) Rademacher complexities of general function classes,
which makes initial steps in bridging the gap between statistical learning
theory and batch RL. Concretely, we view the Bellman error as a surrogate loss
for the optimality gap, and prove the followings: (1) In double sampling
regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the
Rademacher complexity of the function class. (2) In the single sampling regime,
sample-efficient risk minimization is not possible without further assumptions,
regardless of algorithms. However, with completeness assumptions, the excess
risk of FQI and a minimax style algorithm can be again bounded by the
Rademacher complexity of the corresponding function classes. (3) Fast
statistical rates can be achieved by using tools of local Rademacher
complexity. Our analysis covers a wide range of function classes, including
finite classes, linear spaces, kernel spaces, sparse linear features, etc.
Related papers
- On the Efficiency of ERM in Feature Learning [31.277788690403522]
We study the performance of empirical risk minimization on regression problems with square loss over the union of the linear classes induced by feature maps.
We show that when the set $mathcalT$ is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure.
We obtain new guarantees on the performance of the best subset selection procedure in sparse linear regression under general assumptions.
arXiv Detail & Related papers (2024-11-18T20:05:05Z) - Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis [17.526736505065227]
We investigate whether risk-sensitive algorithms can achieve better iteration complexity compared to their risk-neutral counterparts.
Our theoretical analysis demonstrates that risk-sensitive REINFORCE can have a reduced number of iterations required for convergence.
Our simulation results validate that risk-averse cases can converge and stabilize more quickly after approximately half of the episodes compared to their risk-neutral counterparts.
arXiv Detail & Related papers (2024-03-13T20:50:49Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Provably Efficient CVaR RL in Low-rank MDPs [58.58570425202862]
We study risk-sensitive Reinforcement Learning (RL)
We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to balance interplay between exploration, exploitation, and representation learning in CVaR RL.
We prove that our algorithm achieves a sample complexity of $epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations.
arXiv Detail & Related papers (2023-11-20T17:44:40Z) - Regularization and Optimal Multiclass Learning [10.168670899305232]
This work is to characterize the role of regularization in perhaps the simplest setting for which empirical risk minimization fails: multiclass learning with arbitrary label sets.
Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles.
arXiv Detail & Related papers (2023-09-24T16:49:55Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression.
Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z) - Learning with CVaR-based feedback under potentially heavy tails [8.572654816871873]
We study learning algorithms that seek to minimize the conditional value-at-risk (CVaR)
We first study a general-purpose estimator of CVaR for potentially heavy-tailed random variables.
We then derive a new learning algorithm which robustly chooses among candidates produced by gradient-driven sub-processes.
arXiv Detail & Related papers (2020-06-03T01:08:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.