On the Sample Complexity of Vanilla Model-Based Offline Reinforcement
Learning with Dependent Samples
- URL: http://arxiv.org/abs/2303.04268v1
- Date: Tue, 7 Mar 2023 22:39:23 GMT
- Title: On the Sample Complexity of Vanilla Model-Based Offline Reinforcement
Learning with Dependent Samples
- Authors: Mustafa O. Karabag, Ufuk Topcu
- Abstract summary: offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples.
In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition.
We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting.
- Score: 32.707730631343416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (offline RL) considers problems where learning
is performed using only previously collected samples and is helpful for the
settings in which collecting new data is costly or risky. In model-based
offline RL, the learner performs estimation (or optimization) using a model
constructed according to the empirical transition frequencies. We analyze the
sample complexity of vanilla model-based offline RL with dependent samples in
the infinite-horizon discounted-reward setting. In our setting, the samples
obey the dynamics of the Markov decision process and, consequently, may have
interdependencies. Under no assumption of independent samples, we provide a
high-probability, polynomial sample complexity bound for vanilla model-based
off-policy evaluation that requires partial or uniform coverage. We extend this
result to the off-policy optimization under uniform coverage. As a comparison
to the model-based approach, we analyze the sample complexity of off-policy
evaluation with vanilla importance sampling in the infinite-horizon setting.
Finally, we provide an estimator that outperforms the sample-mean estimator for
almost deterministic dynamics that are prevalent in reinforcement learning.
Related papers
- A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization [7.378582040635655]
Current deep learning approaches rely on generative models that yield exact sample likelihoods.
This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models.
We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.
arXiv Detail & Related papers (2024-06-03T17:55:02Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Finite-Time Error Analysis of Online Model-Based Q-Learning with a
Relaxed Sampling Model [6.663174194579773]
$Q$-learning has proven to be a powerful algorithm in model-free settings.
The extension of $Q$-learning to a model-based framework remains relatively unexplored.
arXiv Detail & Related papers (2024-02-19T06:33:51Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Exponential Reduction in Sample Complexity with Learning of Ising Model
Dynamics [14.704630929165274]
We study the problem of reconstructing binary graphical models from correlated samples produced by a dynamical process.
We analyze the sample complexity of two estimators that are based on the interaction screening objective and the conditional likelihood loss.
arXiv Detail & Related papers (2021-04-02T11:44:13Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - PAC Bounds for Imitation and Model-based Batch Learning of Contextual
Markov Decision Processes [31.83144400718369]
We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment.
We study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning.
arXiv Detail & Related papers (2020-06-11T11:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.