Context-Specific Representation Abstraction for Deep Option Learning
- URL: http://arxiv.org/abs/2109.09876v1
- Date: Mon, 20 Sep 2021 22:50:01 GMT
- Title: Context-Specific Representation Abstraction for Deep Option Learning
- Authors: Marwa Abdulhai, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro,
Jonathan P. How
- Abstract summary: We introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL)
CRADOL is a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space.
Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space.
- Score: 43.68681795014662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical reinforcement learning has focused on discovering temporally
extended actions, such as options, that can provide benefits in problems
requiring extensive exploration. One promising approach that learns these
options end-to-end is the option-critic (OC) framework. We examine and show in
this paper that OC does not decompose a problem into simpler sub-problems, but
instead increases the size of the search over policy space with each option
considering the entire state space during learning. This issue can result in
practical limitations of this method, including sample inefficient learning. To
address this problem, we introduce Context-Specific Representation Abstraction
for Deep Option Learning (CRADOL), a new framework that considers both temporal
abstraction and context-specific representation abstraction to effectively
reduce the size of the search over policy space. Specifically, our method
learns a factored belief state representation that enables each option to learn
a policy over only a subsection of the state space. We test our method against
hierarchical, non-hierarchical, and modular recurrent neural network baselines,
demonstrating significant sample efficiency improvements in challenging
partially observable environments.
Related papers
- Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - Reconciling Spatial and Temporal Abstractions for Goal Representation [0.4813333335683418]
Goal representation affects the performance of Hierarchical Reinforcement Learning (HRL) algorithms.
Recent studies show that representations that preserve temporally abstract environment dynamics are successful in solving difficult problems.
We propose a novel three-layer HRL algorithm that introduces, at different levels of the hierarchy, both a spatial and a temporal goal abstraction.
arXiv Detail & Related papers (2024-01-18T10:33:30Z) - An Option-Dependent Analysis of Regret Minimization Algorithms in
Finite-Horizon Semi-Markov Decision Processes [47.037877670620524]
We present an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems.
We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure.
arXiv Detail & Related papers (2023-05-10T15:00:05Z) - Ideal Abstractions for Decision-Focused Learning [108.15241246054515]
We propose a method that configures the output space automatically in order to minimize the loss of decision-relevant information.
We demonstrate the method in two domains: data acquisition for deep neural network training and a closed-loop wildfire management task.
arXiv Detail & Related papers (2023-03-29T23:31:32Z) - Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems.
Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation.
We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Low-Dimensional State and Action Representation Learning with MDP
Homomorphism Metrics [1.5293427903448022]
Deep Reinforcement Learning has shown its ability in solving complicated problems directly from high-dimensional observations.
In end-to-end settings, Reinforcement Learning algorithms are not sample-efficient and requires long training times and quantities of data.
We propose a framework for sample-efficient Reinforcement Learning that take advantage of state and action representations to transform a high-dimensional problem into a low-dimensional one.
arXiv Detail & Related papers (2021-07-04T16:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.