Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism
- URL: http://arxiv.org/abs/2203.05804v1
- Date: Fri, 11 Mar 2022 09:00:12 GMT
- Title: Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism
- Authors: Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang
- Abstract summary: offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
- Score: 65.46524775457928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning, which seeks to utilize offline/historical
data to optimize sequential decision-making strategies, has gained surging
prominence in recent studies. Due to the advantage that appropriate function
approximators can help mitigate the sample complexity burden in modern
reinforcement learning problems, existing endeavors usually enforce powerful
function representation models (e.g. neural networks) to learn the optimal
policies. However, a precise understanding of the statistical limits with
function representations, remains elusive, even when such a representation is
linear.
Towards this goal, we study the statistical limits of offline reinforcement
learning with linear model representations. To derive the tight offline
learning bound, we design the variance-aware pessimistic value iteration
(VAPVI), which adopts the conditional variance information of the value
function for time-inhomogeneous episodic linear Markov decision processes
(MDPs). VAPVI leverages estimated variances of the value functions to reweight
the Bellman residuals in the least-square pessimistic value iteration and
provides improved offline learning bounds over the best-known existing results
(whereas the Bellman residuals are equally weighted by design). More
importantly, our learning bounds are expressed in terms of system quantities,
which provide natural instance-dependent characterizations that previous
results are short of. We hope our results draw a clearer picture of what
offline learning should look like when linear representations are provided.
Related papers
- Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.
This reliance results in data inefficiency as maintaining a state-action-value function in high-dimensional action spaces is challenging.
We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - A Sharp Characterization of Linear Estimators for Offline Policy
Evaluation [33.37672297925897]
offline policy evaluation is a fundamental statistical problem in reinforcement learning.
We identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods.
Our results provide a complete picture of the behavior of linear estimators for offline policy evaluation.
arXiv Detail & Related papers (2022-03-08T17:52:57Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Offline Reinforcement Learning with Value-based Episodic Memory [19.12430651038357]
offline reinforcement learning (RL) shows promise of applying RL to real-world problems.
We propose Expectile V-Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning.
We present a new offline method called Value-based Episodic Memory (VEM)
arXiv Detail & Related papers (2021-10-19T08:20:11Z) - What are the Statistical Limits of Offline RL with Linear Function
Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.