Hierarchical Orchestra of Policies
- URL: http://arxiv.org/abs/2411.03008v1
- Date: Tue, 05 Nov 2024 11:13:09 GMT
- Title: Hierarchical Orchestra of Policies
- Authors: Thomas P Cannon, Özgür Simsek,
- Abstract summary: HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
- Score: 1.6574413179773757
- License:
- Abstract: Continual reinforcement learning poses a major challenge due to the tendency of agents to experience catastrophic forgetting when learning sequential tasks. In this paper, we introduce a modularity-based approach, called Hierarchical Orchestra of Policies (HOP), designed to mitigate catastrophic forgetting in lifelong reinforcement learning. HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. Unlike other state-of-the-art methods, HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks and performs comparably to state-of-the-art transfer methods that require task labelling. Moreover, HOP achieves this without compromising performance when tasks remain constant, highlighting its versatility.
Related papers
- Temporal Abstraction in Reinforcement Learning with Offline Data [8.370420807869321]
We propose a framework by which an online hierarchical reinforcement learning algorithm can be trained on an offline dataset of transitions collected by an unknown behavior policy.
We validate our method on Gym MuJoCo environments and robotic gripper block-stacking tasks in the standard as well as transfer and goal-conditioned settings.
arXiv Detail & Related papers (2024-07-21T18:10:31Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
and Dynamic Distance Constraint [40.3872201560003]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially.
Current methods struggle to find suitable subgoals for ensuring a stable learning process.
We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks
and Action-Tree Based Scheduled Sampling [16.77137239284608]
Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques.
We propose turn-level multi-task objectives for the encoder.
For the decoder, we introduce an action tree-based scheduled sampling technique.
arXiv Detail & Related papers (2024-01-28T11:02:23Z) - GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via
Stationary Distribution Correction Estimation [1.4703485217797363]
GO-DICE is an offline IL technique for goal-conditioned long-horizon sequential tasks.
Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions.
Experimental results substantiate that GO-DICE outperforms recent baselines.
arXiv Detail & Related papers (2023-12-17T19:47:49Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Building a Subspace of Policies for Scalable Continual Learning [21.03369477853538]
We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks.
CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
arXiv Detail & Related papers (2022-11-18T14:59:42Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Adaptive Procedural Task Generation for Hard-Exploration Problems [78.20918366839399]
We introduce Adaptive Procedural Task Generation (APT-Gen) to facilitate reinforcement learning in hard-exploration problems.
At the heart of our approach is a task generator that learns to create tasks from a parameterized task space via a black-box procedural generation module.
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
arXiv Detail & Related papers (2020-07-01T09:38:51Z) - Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video [128.08590291947544]
Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.
Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning framework.
arXiv Detail & Related papers (2020-01-18T15:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.