Related papers: Sample Complexity Characterization for Linear Contextual MDPs

Sample Complexity Characterization for Linear Contextual MDPs

URL: http://arxiv.org/abs/2402.02700v1
Date: Mon, 5 Feb 2024 03:25:04 GMT
Title: Sample Complexity Characterization for Linear Contextual MDPs
Authors: Junze Deng, Yuan Cheng, Shaofeng Zou and Yingbin Liang
Abstract summary: Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
Score: 67.79455646673762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve as an important framework to model many real-world applications with time-varying environments, they are largely unexplored from theoretical perspective. In this paper, we study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights. For both models, we propose novel model-based algorithms and show that they enjoy guaranteed $\epsilon$-suboptimality gap with desired polynomial sample complexity. In particular, instantiating our result for the first model to the tabular CMDP improves the existing result by removing the reachability assumption. Our result for the second model is the first-known result for such a type of function approximation models. Comparison between our results for the two models further indicates that having context-varying features leads to much better sample efficiency than having common representations for all contexts under linear CMDPs.

Related papers

Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes [5.220940151628735]
This work introduces a novel bi-recurrent model architecture that improves sample efficiency and reduces model parameter count in POMDP scenarios.<n>The proposed model architecture outperforms existing transformer-based, attention-based, and recurrence-based methods by a margin ranging from 87.39% to 482.04% on average.
arXiv Detail & Related papers (2025-05-16T11:54:48Z)
Relative Overfitting and Accept-Reject Framework [5.465098504510676]
We propose an ensemble framework that governs how models are segmented to ensure performance improvement.<n>We detail the patterns of this framework within the domain of NLP and briefly describe its to other fields, such as computer vision (CV) and AI for science.
arXiv Detail & Related papers (2025-05-12T17:36:14Z)
Graphical Transformation Models [0.9999629695552195]
We introduce a novel approach to effectively model multivariate data with intricate marginals and complex dependency structures non-parametrically. We show how to approximately regularize GTMs using a lasso penalty towards pairwise conditional independencies. The model's robustness and effectiveness are validated through simulations, showcasing its ability to accurately learn parametric vine copulas.
arXiv Detail & Related papers (2025-03-22T19:41:15Z)
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data. Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph. By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z)
Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC) LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses. LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z)
DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models. We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies. Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z)
PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end. We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees. Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z)
Collaborative Nonstationary Multivariate Gaussian Process Model [2.362467745272567]
We propose a novel model called the collaborative nonstationary Gaussian process model(CNMGP) CNMGP allows us to model data in which outputs do not share a common input set, with a computational complexity independent of the size of the inputs and outputs. We show that our model generally pro-vides better predictive performance than the state-of-the-art, and also provides estimates of time-varying correlations that differ across outputs.
arXiv Detail & Related papers (2021-06-01T18:25:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.