Emergency action termination for immediate reaction in hierarchical
reinforcement learning
- URL: http://arxiv.org/abs/2211.06351v1
- Date: Fri, 11 Nov 2022 16:56:02 GMT
- Title: Emergency action termination for immediate reaction in hierarchical
reinforcement learning
- Authors: Micha{\l} Bortkiewicz, Jakub {\L}yskawa, Pawe{\l} Wawrzy\'nski,
Mateusz Ostaszewski, Artur Grudkowski and Tomasz Trzci\'nski
- Abstract summary: We propose a method in which the validity of higher-level actions (thus lower-level goals) is constantly verified at the higher level.
If the actions, i.e. lower level goals, become inadequate, they are replaced by more appropriate ones.
This way we combine the advantages of hierarchical RL, which is fast training, and flat RL, which is immediate reactivity.
- Score: 8.637919344171255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical decomposition of control is unavoidable in large dynamical
systems. In reinforcement learning (RL), it is usually solved with subgoals
defined at higher policy levels and achieved at lower policy levels. Reaching
these goals can take a substantial amount of time, during which it is not
verified whether they are still worth pursuing. However, due to the randomness
of the environment, these goals may become obsolete. In this paper, we address
this gap in the state-of-the-art approaches and propose a method in which the
validity of higher-level actions (thus lower-level goals) is constantly
verified at the higher level. If the actions, i.e. lower level goals, become
inadequate, they are replaced by more appropriate ones. This way we combine the
advantages of hierarchical RL, which is fast training, and flat RL, which is
immediate reactivity. We study our approach experimentally on seven benchmark
environments.
Related papers
- Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies [26.915223518488016]
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by decomposing them into subgoals.
We propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO), a simple yet effective algorithm that also enjoys computation efficiency.
Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness.
arXiv Detail & Related papers (2024-06-26T04:05:04Z) - A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions.
At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z) - Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Adversarially Guided Subgoal Generation for Hierarchical Reinforcement
Learning [5.514236598436977]
We propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy.
Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.
arXiv Detail & Related papers (2022-01-24T12:30:38Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Efficient Hierarchical Exploration with Stable Subgoal Representation
Learning [26.537055962523162]
We propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas.
We develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states.
arXiv Detail & Related papers (2021-05-31T07:28:59Z) - Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement
Learning [22.319208517053816]
Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques.
HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large.
We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2020-06-20T03:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.