Learning Hidden Subgoals under Temporal Ordering Constraints in Reinforcement Learning
- URL: http://arxiv.org/abs/2411.01425v1
- Date: Sun, 03 Nov 2024 03:22:39 GMT
- Title: Learning Hidden Subgoals under Temporal Ordering Constraints in Reinforcement Learning
- Authors: Duo Xu, Faramarz Fekri,
- Abstract summary: We propose a novel RL algorithm for bf l hidden bf subgoals under bf temporal bf ordering bf constraints (LSTOC)
We propose a new contrastive learning objective which can effectively learn hidden subgoals and their temporal orderings at the same time.
- Score: 14.46490764849977
- License:
- Abstract: In real-world applications, the success of completing a task is often determined by multiple key steps which are distant in time steps and have to be achieved in a fixed time order. For example, the key steps listed on the cooking recipe should be achieved one-by-one in the right time order. These key steps can be regarded as subgoals of the task and their time orderings are described as temporal ordering constraints. However, in many real-world problems, subgoals or key states are often hidden in the state space and their temporal ordering constraints are also unknown, which make it challenging for previous RL algorithms to solve this kind of tasks. In order to address this issue, in this work we propose a novel RL algorithm for {\bf l}earning hidden {\bf s}ubgoals under {\bf t}emporal {\bf o}rdering {\bf c}onstraints (LSTOC). We propose a new contrastive learning objective which can effectively learn hidden subgoals (key states) and their temporal orderings at the same time, based on first-occupancy representation and temporal geometric sampling. In addition, we propose a sample-efficient learning strategy to discover subgoals one-by-one following their temporal order constraints by building a subgoal tree to represent discovered subgoals and their temporal ordering relationships. Specifically, this tree can be used to improve the sample efficiency of trajectory collection, fasten the task solving and generalize to unseen tasks. The LSTOC framework is evaluated on several environments with image-based observations, showing its significant improvement over baseline methods.
Related papers
- SMAUG: A Sliding Multidimensional Task Window-Based MARL Framework for
Adaptive Real-Time Subtask Recognition [11.236363226878975]
Subtask-based multi-agent reinforcement learning (MARL) methods enable agents to learn how to tackle different subtasks.
textbfSliding textbfMultidimensional ttextbfAsk window based mtextbfUti-agent reinforcement learnintextbfG framework (SMAUG) is proposed for adaptive real-time subtask recognition.
Experiments on StarCraft II show that SMAUG not only demonstrates performance superiority in comparison with all baselines but also presents a more prominent and swift rise in rewards
arXiv Detail & Related papers (2024-03-04T08:04:41Z) - Optimal Task Assignment and Path Planning using Conflict-Based Search with Precedence and Temporal Constraints [5.265273282482319]
This paper examines the Task Assignment and Path Finding with Precedence and Temporal Constraints (TAPF-PTC) problem.
We augment Conflict-Based Search (CBS) to simultaneously generate task assignments and collision-free paths that adhere to precedence and temporal constraints.
Experimentally, we demonstrate that our algorithm, CBS-TA-PTC, can solve highly challenging bomb-defusing tasks with precedence and temporal constraints efficiently.
arXiv Detail & Related papers (2024-02-13T20:07:58Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Reinforcement Learning with Simple Sequence Priors [9.869634509510016]
We propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible.
We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches.
arXiv Detail & Related papers (2023-05-26T17:18:14Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Continual Learning with Distributed Optimization: Does CoCoA Forget? [0.0]
We focus on the continual learning problem where the tasks arrive sequentially.
The aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks.
We consider the well-established distributed learning algorithm COCOA.
arXiv Detail & Related papers (2022-11-30T13:49:43Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z) - Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding
Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion.
The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.