Related papers: PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

URL: http://arxiv.org/abs/2006.06352v2
Date: Fri, 17 Jul 2020 19:04:11 GMT
Title: PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes
Authors: Yash Nair and Finale Doshi-Velez
Abstract summary: We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment. We study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning.
Score: 31.83144400718369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment. In particular, we study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning. First, we derive sample complexity bounds for DPL, and then show that model-based learning from expert actions can, even with a finite model class, be impossible. After relaxing the conditions under which the model-based approach is expected to learn by allowing for greater coverage of state-action space, we provide sample complexity bounds for model-based learning with finite model classes, showing that there exist model classes with sample complexity exponential in their statistical complexity. We then derive a sample complexity upper bound for model-based learning based on a measure of concentration of the data distribution. Our results give formal justification for imitation learning over model-based learning in this setting.

Related papers

Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z)
Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification [0.0]
We propose a novel learning framework that can generate synthetic data instances in a data-driven manner. The proposed framework formulates the oversampling process as a composition of discrete decision criteria. Experiments on the imbalanced classification task demonstrate the superiority of our framework over state-of-the-art algorithms.
arXiv Detail & Related papers (2025-02-08T13:35:00Z)
Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching. We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy. Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z)
Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model [6.663174194579773]
$Q$-learning has proven to be a powerful algorithm in model-free settings. The extension of $Q$-learning to a model-based framework remains relatively unexplored.
arXiv Detail & Related papers (2024-02-19T06:33:51Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Distribution-free Deviation Bounds and The Role of Domain Knowledge in Learning via Model Selection with Cross-validation Risk Estimation [0.0]
Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. This paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework.
arXiv Detail & Related papers (2023-03-15T17:18:31Z)
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples [32.707730631343416]
offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting.
arXiv Detail & Related papers (2023-03-07T22:39:23Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning [18.37286885057802]
We propose an algorithm combining learning and planning to exploit a previously unusable class of incomplete models. This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.
arXiv Detail & Related papers (2022-03-09T22:55:53Z)
Model Complexity of Deep Learning: A Survey [79.20117679251766]
We conduct a systematic overview of the latest studies on model complexity in deep learning. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity.
arXiv Detail & Related papers (2021-03-08T22:39:32Z)
Demystifying Deep Learning in Predictive Spatio-Temporal Analytics: An Information-Theoretic Framework [20.28063653485698]
We provide a comprehensive framework for deep learning model design and information-theoretic analysis. First, we develop and demonstrate a novel interactively-connected deep recurrent neural network (I$2$DRNN) model. Second, to theoretically prove that our designed model can learn multi-scale-temporal dependency in PSTA tasks, we provide an information-theoretic analysis.
arXiv Detail & Related papers (2020-09-14T10:05:14Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.