Sample Complexity Characterization for Linear Contextual MDPs
- URL: http://arxiv.org/abs/2402.02700v1
- Date: Mon, 5 Feb 2024 03:25:04 GMT
- Title: Sample Complexity Characterization for Linear Contextual MDPs
- Authors: Junze Deng, Yuan Cheng, Shaofeng Zou and Yingbin Liang
- Abstract summary: Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
- Score: 67.79455646673762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual Markov decision processes (CMDPs) describe a class of
reinforcement learning problems in which the transition kernels and reward
functions can change over time with different MDPs indexed by a context
variable. While CMDPs serve as an important framework to model many real-world
applications with time-varying environments, they are largely unexplored from
theoretical perspective. In this paper, we study CMDPs under two linear
function approximation models: Model I with context-varying representations and
common linear weights for all contexts; and Model II with common
representations for all contexts and context-varying linear weights. For both
models, we propose novel model-based algorithms and show that they enjoy
guaranteed $\epsilon$-suboptimality gap with desired polynomial sample
complexity. In particular, instantiating our result for the first model to the
tabular CMDP improves the existing result by removing the reachability
assumption. Our result for the second model is the first-known result for such
a type of function approximation models. Comparison between our results for the
two models further indicates that having context-varying features leads to much
better sample efficiency than having common representations for all contexts
under linear CMDPs.
Related papers
- Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data.
Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph.
By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end.
We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees.
Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z) - Collaborative Nonstationary Multivariate Gaussian Process Model [2.362467745272567]
We propose a novel model called the collaborative nonstationary Gaussian process model(CNMGP)
CNMGP allows us to model data in which outputs do not share a common input set, with a computational complexity independent of the size of the inputs and outputs.
We show that our model generally pro-vides better predictive performance than the state-of-the-art, and also provides estimates of time-varying correlations that differ across outputs.
arXiv Detail & Related papers (2021-06-01T18:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.