Related papers: From proprioception to long-horizon planning in novel environments: A hierarchical RL model

From proprioception to long-horizon planning in novel environments: A hierarchical RL model

URL: http://arxiv.org/abs/2006.06620v1
Date: Thu, 11 Jun 2020 17:19:12 GMT
Title: From proprioception to long-horizon planning in novel environments: A hierarchical RL model
Authors: Nishad Gothoskar, Miguel L\'azaro-Gredilla, Dileep George
Abstract summary: In this work, we introduce a simple, three-level hierarchical architecture that reflects different types of reasoning. We apply our method to a series of navigation tasks in the Mujoco Ant environment.
Score: 4.44317046648898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For an intelligent agent to flexibly and efficiently operate in complex environments, they must be able to reason at multiple levels of temporal, spatial, and conceptual abstraction. At the lower levels, the agent must interpret their proprioceptive inputs and control their muscles, and at the higher levels, the agent must select goals and plan how they will achieve those goals. It is clear that each of these types of reasoning is amenable to different types of representations, algorithms, and inputs. In this work, we introduce a simple, three-level hierarchical architecture that reflects these distinctions. The low-level controller operates on the continuous proprioceptive inputs, using model-free learning to acquire useful behaviors. These in turn induce a set of mid-level dynamics, which are learned by the mid-level controller and used for model-predictive control, to select a behavior to activate at each timestep. The high-level controller leverages a discrete, graph representation for goal selection and path planning to specify targets for the mid-level controller. We apply our method to a series of navigation tasks in the Mujoco Ant environment, consistently demonstrating significant improvements in sample-efficiency compared to prior model-free, model-based, and hierarchical RL methods. Finally, as an illustrative example of the advantages of our architecture, we apply our method to a complex maze environment that requires efficient exploration and long-horizon planning.

Related papers

Object-centric proto-symbolic behavioural reasoning from pixels [0.0]
We present a brain-inspired, deep-learning architecture that learns from pixels to interpret, control, and reason about its environment. Results show that the agent can learn emergent conditional behavioural reasoning. The proposed architecture shows how to manipulate grounded object representations, as a key inductive bias for unsupervised learning.
arXiv Detail & Related papers (2024-11-26T13:54:24Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Exploring the limits of Hierarchical World Models in Reinforcement Learning [0.7499722271664147]
We describe a novel HMBRL framework and evaluate it thoroughly. We construct hierarchical world models that simulate environment dynamics at various levels of temporal abstraction. Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions.
arXiv Detail & Related papers (2024-06-01T16:29:03Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels [42.275164872809746]
We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals. Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level. It then uses this world model to choose optimal high-level goals through a tree-search planning procedure.
arXiv Detail & Related papers (2023-10-16T01:13:26Z)
AI planning in the imagination: High-level planning on learned abstract search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z)
Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z)
Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)
Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks. In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z)
Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning [36.050432925402845]
We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks. We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods.
arXiv Detail & Related papers (2020-02-14T10:19:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.