Learning Robust State Abstractions for Hidden-Parameter Block MDPs
- URL: http://arxiv.org/abs/2007.07206v4
- Date: Fri, 12 Feb 2021 04:40:14 GMT
- Title: Learning Robust State Abstractions for Hidden-Parameter Block MDPs
- Authors: Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau
- Abstract summary: We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
- Score: 55.31018404591743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many control tasks exhibit similar dynamics that can be modeled as having
common latent structure. Hidden-Parameter Markov Decision Processes (HiP-MDPs)
explicitly model this structure to improve sample efficiency in multi-task
settings. However, this setting makes strong assumptions on the observability
of the state that limit its application in real-world scenarios with rich
observation spaces. In this work, we leverage ideas of common structure from
the HiP-MDP setting, and extend it to enable robust state abstractions inspired
by Block MDPs. We derive instantiations of this new framework for both
multi-task reinforcement learning (MTRL) and meta-reinforcement learning
(Meta-RL) settings. Further, we provide transfer and generalization bounds
based on task and state similarity, along with sample complexity bounds that
depend on the aggregate number of samples across tasks, rather than the number
of tasks, a significant improvement over prior work that use the same
environment assumptions. To further demonstrate the efficacy of the proposed
method, we empirically compare and show improvement over multi-task and
meta-reinforcement learning baselines.
Related papers
- Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL)
Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning.
We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z) - Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism [7.479892725446205]
Multi-task learning (MTL) is a paradigm that simultaneously learns multiple tasks by sharing information at different levels.
We introduce a posteriori information into the model, considering that different tasks may produce correlated outputs with mutual influences.
We achieve this by incorporating a feedback mechanism into MTL models, where the output of one task serves as a hidden feature for another task.
arXiv Detail & Related papers (2024-04-01T03:27:34Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Sparse Attention Guided Dynamic Value Estimation for Single-Task
Multi-Scene Reinforcement Learning [16.910911657616005]
Training deep reinforcement learning agents on environments with multiple levels / scenes from the same task, has become essential for many applications.
We argue that the sample variance for a multi-scene environment is best minimized by treating each scene as a distinct MDP.
We also demonstrate that the true joint value function for a multi-scene environment, follows a multi-modal distribution which is not captured by traditional CNN / LSTM based critic networks.
arXiv Detail & Related papers (2021-02-14T23:30:13Z) - Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement
Learning [22.889059874754242]
Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications.
We propose a dynamic value estimation (DVE) technique for these multiple-MDP environments, motivated by the clustering effect observed in the value function distribution across different scenes.
arXiv Detail & Related papers (2020-05-25T17:56:08Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.