Related papers: Context-Specific Representation Abstraction for Deep Option Learning

Context-Specific Representation Abstraction for Deep Option Learning

URL: http://arxiv.org/abs/2109.09876v1
Date: Mon, 20 Sep 2021 22:50:01 GMT
Title: Context-Specific Representation Abstraction for Deep Option Learning
Authors: Marwa Abdulhai, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro, Jonathan P. How
Abstract summary: We introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL) CRADOL is a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space. Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space.
Score: 43.68681795014662
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration. One promising approach that learns these options end-to-end is the option-critic (OC) framework. We examine and show in this paper that OC does not decompose a problem into simpler sub-problems, but instead increases the size of the search over policy space with each option considering the entire state space during learning. This issue can result in practical limitations of this method, including sample inefficient learning. To address this problem, we introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL), a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space. Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space. We test our method against hierarchical, non-hierarchical, and modular recurrent neural network baselines, demonstrating significant sample efficiency improvements in challenging partially observable environments.

Related papers

Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs [17.335266921332092]
Large Language Models (LLMs) have shown remarkable reasoning ability through explicit Chain-of-Thought prompting.<n>We develop a framework for efficient, implicit reasoning, where the model "thinks" in a latent space without generating explicit text for every step.
arXiv Detail & Related papers (2025-07-22T11:22:58Z)
Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy. We first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z)
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z)
Reconciling Spatial and Temporal Abstractions for Goal Representation [0.4813333335683418]
Goal representation affects the performance of Hierarchical Reinforcement Learning (HRL) algorithms. Recent studies show that representations that preserve temporally abstract environment dynamics are successful in solving difficult problems. We propose a novel three-layer HRL algorithm that introduces, at different levels of the hierarchy, both a spatial and a temporal goal abstraction.
arXiv Detail & Related papers (2024-01-18T10:33:30Z)
An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes [47.037877670620524]
We present an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems. We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure.
arXiv Detail & Related papers (2023-05-10T15:00:05Z)
Ideal Abstractions for Decision-Focused Learning [108.15241246054515]
We propose a method that configures the output space automatically in order to minimize the loss of decision-relevant information. We demonstrate the method in two domains: data acquisition for deep neural network training and a closed-loop wildfire management task.
arXiv Detail & Related papers (2023-03-29T23:31:32Z)
Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z)
Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$. In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially. Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z)
Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework. We show that this leads to behaviorally diverse options which are also capable of state abstraction. We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z)
Low-Dimensional State and Action Representation Learning with MDP Homomorphism Metrics [1.5293427903448022]
Deep Reinforcement Learning has shown its ability in solving complicated problems directly from high-dimensional observations. In end-to-end settings, Reinforcement Learning algorithms are not sample-efficient and requires long training times and quantities of data. We propose a framework for sample-efficient Reinforcement Learning that take advantage of state and action representations to transform a high-dimensional problem into a low-dimensional one.
arXiv Detail & Related papers (2021-07-04T16:26:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.