Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient
- URL: http://arxiv.org/abs/2210.00750v1
- Date: Mon, 3 Oct 2022 07:59:42 GMT
- Title: Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient
- Authors: Ming Yin, Mengdi Wang, Yu-Xiang Wang
- Abstract summary: offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
- Score: 65.08966446962845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning, which aims at optimizing sequential
decision-making strategies with historical data, has been extensively applied
in real-life applications. State-Of-The-Art algorithms usually leverage
powerful function approximators (e.g. neural networks) to alleviate the sample
complexity hurdle for better empirical performances. Despite the successes, a
more systematic understanding of the statistical complexity for function
approximation remains lacking. Towards bridging the gap, we take a step by
considering offline reinforcement learning with differentiable function class
approximation (DFA). This function class naturally incorporates a wide range of
models with nonlinear/nonconvex structures. Most importantly, we show offline
RL with differentiable function approximation is provably efficient by
analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results
provide the theoretical basis for understanding a variety of practical
heuristics that rely on Fitted Q-Iteration style design. In addition, we
further improve our guarantee with a tighter instance-dependent
characterization. We hope our work could draw interest in studying
reinforcement learning with differentiable function approximation beyond the
scope of current research.
Related papers
- Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - FAStEN: An Efficient Adaptive Method for Feature Selection and Estimation in High-Dimensional Functional Regressions [7.674715791336311]
We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse function-on-function regression problem.
We show how to extend it to the scalar-on-function framework.
We present an application to brain fMRI data from the AOMIC PIOP1 study.
arXiv Detail & Related papers (2023-03-26T19:41:17Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.