Finite-Time Error Analysis of Online Model-Based Q-Learning with a
Relaxed Sampling Model
- URL: http://arxiv.org/abs/2402.11877v1
- Date: Mon, 19 Feb 2024 06:33:51 GMT
- Title: Finite-Time Error Analysis of Online Model-Based Q-Learning with a
Relaxed Sampling Model
- Authors: Han-Dong Lim, HyeAnn Lee, Donghwan Lee
- Abstract summary: $Q$-learning has proven to be a powerful algorithm in model-free settings.
The extension of $Q$-learning to a model-based framework remains relatively unexplored.
- Score: 6.663174194579773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has witnessed significant advancements, particularly
with the emergence of model-based approaches. Among these, $Q$-learning has
proven to be a powerful algorithm in model-free settings. However, the
extension of $Q$-learning to a model-based framework remains relatively
unexplored. In this paper, we delve into the sample complexity of $Q$-learning
when integrated with a model-based approach. Through theoretical analyses and
empirical evaluations, we seek to elucidate the conditions under which
model-based $Q$-learning excels in terms of sample efficiency compared to its
model-free counterpart.
Related papers
- On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - On the Sample Complexity of Vanilla Model-Based Offline Reinforcement
Learning with Dependent Samples [32.707730631343416]
offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples.
In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition.
We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting.
arXiv Detail & Related papers (2023-03-07T22:39:23Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Model Embedding Model-Based Reinforcement Learning [4.566180616886624]
Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL)
Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias.
We propose a simple and elegant model-embedding model-based reinforcement learning (MEMB) algorithm in the framework of the probabilistic reinforcement learning.
arXiv Detail & Related papers (2020-06-16T15:10:28Z) - PAC Bounds for Imitation and Model-based Batch Learning of Contextual
Markov Decision Processes [31.83144400718369]
We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment.
We study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning.
arXiv Detail & Related papers (2020-06-11T11:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.