Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks
- URL: http://arxiv.org/abs/2206.01812v1
- Date: Fri, 3 Jun 2022 20:38:27 GMT
- Title: Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks
- Authors: Andrew C. Li, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A.
McIlraith
- Abstract summary: We propose a set of tasks which admit many distinct solutions at the high-level, but require reasoning about states and rewards thousands of steps into the future for the best performance.
We find that standard RL methods often neglect long-term effects due to discounting, while general-purpose hierarchical RL approaches struggle unless additional abstract domain knowledge can be exploited.
- Score: 25.37125069796657
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning has shown promise in discrete domains requiring
complex reasoning, including games such as Chess, Go, and Hanabi. However, this
type of reasoning is less often observed in long-horizon, continuous domains
with high-dimensional observations, where instead RL research has predominantly
focused on problems with simple high-level structure (e.g. opening a drawer or
moving a robot as fast as possible). Inspired by combinatorially hard
optimization problems, we propose a set of robotics tasks which admit many
distinct solutions at the high-level, but require reasoning about states and
rewards thousands of steps into the future for the best performance.
Critically, while RL has traditionally suffered on complex, long-horizon tasks
due to sparse rewards, our tasks are carefully designed to be solvable without
specialized exploration. Nevertheless, our investigation finds that standard RL
methods often neglect long-term effects due to discounting, while
general-purpose hierarchical RL approaches struggle unless additional abstract
domain knowledge can be exploited.
Related papers
- Where to Intervene: Action Selection in Deep Reinforcement Learning [5.470195794278266]
We propose a general data-driven action selection approach with model-free and computationally friendly properties.<n>Our method not only selects minimal sufficient actions but also controls the false discovery rate via knockoff sampling.
arXiv Detail & Related papers (2025-07-05T23:40:55Z) - RRO: LLM Agent Optimization Through Rising Reward Trajectories [52.579992804584464]
Large language models (LLMs) have exhibited extraordinary performance in a variety of tasks.<n>In practice, agents sensitive to the outcome of certain key steps which makes them likely to fail the task.<n>We propose Reward Rising Optimization (RRO) to mitigate this issue.
arXiv Detail & Related papers (2025-05-27T05:27:54Z) - DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning [33.66640909392995]
We argue that solving complex and high-dimensional tasks requires solving simpler tasks that are relevant to the target task.<n>We propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task.
arXiv Detail & Related papers (2025-05-26T11:35:07Z) - Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL)
Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies.
Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z) - Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input [5.522800137785975]
We introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework.
The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy.
By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.
arXiv Detail & Related papers (2024-09-20T12:27:47Z) - Granger Causal Interaction Skill Chains [35.143372688036685]
Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability.
We introduce the Chain of Interaction Skills (COInS) algorithm, which focuses on controllability factored in domains to identify a small number of task-agnostic skills that still permit a high degree of control.
We also demonstrate the transferability of skills learned by COInS, using variants of Breakout, a common RL benchmark, and show 2-3x improvement in both sample efficiency and final performance compared to standard RL baselines.
arXiv Detail & Related papers (2023-06-15T21:06:54Z) - PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning [25.84621883831624]
Hierarchical reinforcement learning has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration.
We present primitive enabled adaptive relabeling (PEAR)
We first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision.
We then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL)
arXiv Detail & Related papers (2023-06-10T09:41:30Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Understanding the Complexity Gains of Single-Task RL with a Curriculum [83.46923851724408]
Reinforcement learning (RL) problems can be challenging without well-shaped rewards.
We provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum.
We show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem.
arXiv Detail & Related papers (2022-12-24T19:46:47Z) - Wish you were here: Hindsight Goal Selection for long-horizon dexterous
manipulation [14.901636098553848]
Solving tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning.
Existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical.
We extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations.
arXiv Detail & Related papers (2021-12-01T16:12:32Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Room Clearance with Feudal Hierarchical Reinforcement Learning [2.867517731896504]
We introduce a new simulation environment, "it", designed as a tool to build scenarios that can drive RL research in a direction useful for military analysis.
We focus on an abstracted and simplified room clearance scenario, where a team of blue agents have to make their way through a building and ensure that all rooms are cleared of enemy red agents.
We implement a multi-agent version of feudal hierarchical RL that introduces a command hierarchy where a commander at the higher level sends orders to multiple agents at the lower level who simply have to learn to follow these orders.
We find that breaking the task down in this way allows us to
arXiv Detail & Related papers (2021-05-24T15:05:58Z) - How to Train Your Robot with Deep Reinforcement Learning; Lessons We've
Learned [111.06812202454364]
We present a number of case studies involving robotic deep RL.
We discuss commonly perceived challenges in deep RL and how they have been addressed in these works.
We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting.
arXiv Detail & Related papers (2021-02-04T22:09:28Z) - Weakly-Supervised Reinforcement Learning for Controllable Behavior [126.04932929741538]
Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks.
In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve.
We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
arXiv Detail & Related papers (2020-04-06T17:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.