PAC Bounds for Imitation and Model-based Batch Learning of Contextual
Markov Decision Processes
- URL: http://arxiv.org/abs/2006.06352v2
- Date: Fri, 17 Jul 2020 19:04:11 GMT
- Title: PAC Bounds for Imitation and Model-based Batch Learning of Contextual
Markov Decision Processes
- Authors: Yash Nair and Finale Doshi-Velez
- Abstract summary: We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment.
We study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning.
- Score: 31.83144400718369
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of batch multi-task reinforcement learning with
observed context descriptors, motivated by its application to personalized
medical treatment. In particular, we study two general classes of learning
algorithms: direct policy learning (DPL), an imitation-learning based approach
which learns from expert trajectories, and model-based learning. First, we
derive sample complexity bounds for DPL, and then show that model-based
learning from expert actions can, even with a finite model class, be
impossible. After relaxing the conditions under which the model-based approach
is expected to learn by allowing for greater coverage of state-action space, we
provide sample complexity bounds for model-based learning with finite model
classes, showing that there exist model classes with sample complexity
exponential in their statistical complexity. We then derive a sample complexity
upper bound for model-based learning based on a measure of concentration of the
data distribution. Our results give formal justification for imitation learning
over model-based learning in this setting.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Finite-Time Error Analysis of Online Model-Based Q-Learning with a
Relaxed Sampling Model [6.663174194579773]
$Q$-learning has proven to be a powerful algorithm in model-free settings.
The extension of $Q$-learning to a model-based framework remains relatively unexplored.
arXiv Detail & Related papers (2024-02-19T06:33:51Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM.
For learning methods, we explore the claim of a "free lunch" hypothesis.
For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z) - Distribution-free Deviation Bounds and The Role of Domain Knowledge in Learning via Model Selection with Cross-validation Risk Estimation [0.0]
Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning.
This paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework.
arXiv Detail & Related papers (2023-03-15T17:18:31Z) - On the Sample Complexity of Vanilla Model-Based Offline Reinforcement
Learning with Dependent Samples [32.707730631343416]
offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples.
In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition.
We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting.
arXiv Detail & Related papers (2023-03-07T22:39:23Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement
Learning [18.37286885057802]
We propose an algorithm combining learning and planning to exploit a previously unusable class of incomplete models.
This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.
arXiv Detail & Related papers (2022-03-09T22:55:53Z) - Model Complexity of Deep Learning: A Survey [79.20117679251766]
We conduct a systematic overview of the latest studies on model complexity in deep learning.
We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity.
arXiv Detail & Related papers (2021-03-08T22:39:32Z) - Demystifying Deep Learning in Predictive Spatio-Temporal Analytics: An
Information-Theoretic Framework [20.28063653485698]
We provide a comprehensive framework for deep learning model design and information-theoretic analysis.
First, we develop and demonstrate a novel interactively-connected deep recurrent neural network (I$2$DRNN) model.
Second, to theoretically prove that our designed model can learn multi-scale-temporal dependency in PSTA tasks, we provide an information-theoretic analysis.
arXiv Detail & Related papers (2020-09-14T10:05:14Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.