DeepAveragers: Offline Reinforcement Learning by Solving Derived
Non-Parametric MDPs
- URL: http://arxiv.org/abs/2010.08891v1
- Date: Sun, 18 Oct 2020 00:11:45 GMT
- Title: DeepAveragers: Offline Reinforcement Learning by Solving Derived
Non-Parametric MDPs
- Authors: Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern
- Abstract summary: We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience.
Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL.
- Score: 47.73837217824527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study an approach to offline reinforcement learning (RL) based on
optimally solving finitely-represented MDPs derived from a static dataset of
experience. This approach can be applied on top of any learned representation
and has the potential to easily support multiple solution objectives as well as
zero-shot adjustment to changing environments and goals. Our main contribution
is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate
its solutions for offline RL. DAC-MDPs are a non-parametric model that can
leverage deep representations and account for limited data by introducing costs
for exploiting under-represented parts of the model. In theory, we show
conditions that allow for lower-bounding the performance of DAC-MDP solutions.
We also investigate the empirical behavior in a number of environments,
including those with image-based observations. Overall, the experiments
demonstrate that the framework can work in practice and scale to large complex
offline RL problems.
Related papers
- On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data.
Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph.
By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z) - Causal prompting model-based offline reinforcement learning [16.95292725275873]
Model-based offline RL allows agents to fully utilise pre-collected datasets without requiring additional or unethical explorations.
Applying model-based offline RL to online systems presents challenges due to the highly suboptimal (noise-filled) and diverse nature of datasets generated by online systems.
We introduce the Causal Prompting Reinforcement Learning framework, designed for highly suboptimal and resource-constrained online scenarios.
arXiv Detail & Related papers (2024-06-03T07:28:57Z) - POMDP inference and robust solution via deep reinforcement learning: An
application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL.
First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model.
The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z) - The Impact of Task Underspecification in Evaluating Deep Reinforcement
Learning [1.4711121887106535]
Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part of scientific progress of the field.
In this article, we augment DRL evaluations to consider parameterized families of MDPs.
We show that evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.
arXiv Detail & Related papers (2022-10-16T18:51:55Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.