From proprioception to long-horizon planning in novel environments: A
hierarchical RL model
- URL: http://arxiv.org/abs/2006.06620v1
- Date: Thu, 11 Jun 2020 17:19:12 GMT
- Title: From proprioception to long-horizon planning in novel environments: A
hierarchical RL model
- Authors: Nishad Gothoskar, Miguel L\'azaro-Gredilla, Dileep George
- Abstract summary: In this work, we introduce a simple, three-level hierarchical architecture that reflects different types of reasoning.
We apply our method to a series of navigation tasks in the Mujoco Ant environment.
- Score: 4.44317046648898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For an intelligent agent to flexibly and efficiently operate in complex
environments, they must be able to reason at multiple levels of temporal,
spatial, and conceptual abstraction. At the lower levels, the agent must
interpret their proprioceptive inputs and control their muscles, and at the
higher levels, the agent must select goals and plan how they will achieve those
goals. It is clear that each of these types of reasoning is amenable to
different types of representations, algorithms, and inputs. In this work, we
introduce a simple, three-level hierarchical architecture that reflects these
distinctions. The low-level controller operates on the continuous
proprioceptive inputs, using model-free learning to acquire useful behaviors.
These in turn induce a set of mid-level dynamics, which are learned by the
mid-level controller and used for model-predictive control, to select a
behavior to activate at each timestep. The high-level controller leverages a
discrete, graph representation for goal selection and path planning to specify
targets for the mid-level controller. We apply our method to a series of
navigation tasks in the Mujoco Ant environment, consistently demonstrating
significant improvements in sample-efficiency compared to prior model-free,
model-based, and hierarchical RL methods. Finally, as an illustrative example
of the advantages of our architecture, we apply our method to a complex maze
environment that requires efficient exploration and long-horizon planning.
Related papers
- Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction [19.59151245929067]
We study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning.
We find this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels.
We make use of this to propose a fully model-based algorithm that learns a discriminative world model.
arXiv Detail & Related papers (2024-08-21T17:59:31Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly.
We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction.
Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels [42.275164872809746]
We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals.
Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level.
It then uses this world model to choose optimal high-level goals through a tree-search planning procedure.
arXiv Detail & Related papers (2023-10-16T01:13:26Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z) - Learning Functionally Decomposed Hierarchies for Continuous Control
Tasks with Path Planning [36.050432925402845]
We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks.
We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods.
arXiv Detail & Related papers (2020-02-14T10:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.