Room Clearance with Feudal Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2105.11328v1
- Date: Mon, 24 May 2021 15:05:58 GMT
- Title: Room Clearance with Feudal Hierarchical Reinforcement Learning
- Authors: Henry Charlesworth, Adrian Millea, Eddie Pottrill, Rich Riley
- Abstract summary: We introduce a new simulation environment, "it", designed as a tool to build scenarios that can drive RL research in a direction useful for military analysis.
We focus on an abstracted and simplified room clearance scenario, where a team of blue agents have to make their way through a building and ensure that all rooms are cleared of enemy red agents.
We implement a multi-agent version of feudal hierarchical RL that introduces a command hierarchy where a commander at the higher level sends orders to multiple agents at the lower level who simply have to learn to follow these orders.
We find that breaking the task down in this way allows us to
- Score: 2.867517731896504
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning (RL) is a general framework that allows systems to
learn autonomously through trial-and-error interaction with their environment.
In recent years combining RL with expressive, high-capacity neural network
models has led to impressive performance in a diverse range of domains.
However, dealing with the large state and action spaces often required for
problems in the real world still remains a significant challenge. In this paper
we introduce a new simulation environment, "Gambit", designed as a tool to
build scenarios that can drive RL research in a direction useful for military
analysis. Using this environment we focus on an abstracted and simplified room
clearance scenario, where a team of blue agents have to make their way through
a building and ensure that all rooms are cleared of (and remain clear) of enemy
red agents. We implement a multi-agent version of feudal hierarchical RL that
introduces a command hierarchy where a commander at the higher level sends
orders to multiple agents at the lower level who simply have to learn to follow
these orders. We find that breaking the task down in this way allows us to
solve a number of non-trivial floorplans that require the coordination of
multiple agents much more efficiently than the standard baseline RL algorithms
we compare with. We then go on to explore how qualitatively different behaviour
can emerge depending on what we prioritise in the agent's reward function (e.g.
clearing the building quickly vs. prioritising rescuing civilians).
Related papers
- Mastering the Digital Art of War: Developing Intelligent Combat Simulation Agents for Wargaming Using Hierarchical Reinforcement Learning [0.0]
dissertation proposes a comprehensive approach, including targeted observation abstractions, multi-model integration, a hybrid AI framework, and an overarching hierarchical reinforcement learning framework.
Our localized observation abstraction using piecewise linear spatial decay simplifies the RL problem, enhancing computational efficiency and demonstrating superior efficacy over traditional global observation methods.
Our hybrid AI framework synergizes RL with scripted agents, leveraging RL for high-level decisions and scripted agents for lower-level tasks, enhancing adaptability, reliability, and performance.
arXiv Detail & Related papers (2024-08-23T18:50:57Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - MASP: Scalable GNN-based Planning for Multi-Agent Navigation [17.788592987873905]
We propose a goal-conditioned hierarchical planner for navigation tasks with a substantial number of agents.
We also leverage graph neural networks (GNN) to model the interaction between agents and goals, improving goal achievement.
The results demonstrate that MASP outperforms classical planning-based competitors and RL baselines.
arXiv Detail & Related papers (2023-12-05T06:05:04Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Hierarchical Reinforcement Learning with Opponent Modeling for
Distributed Multi-agent Cooperation [13.670618752160594]
Deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments.
Traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search.
We propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search.
arXiv Detail & Related papers (2022-06-25T19:09:29Z) - Environment Generation for Zero-Shot Compositional Reinforcement
Learning [105.35258025210862]
Compositional Design of Environments (CoDE) trains a Generator agent to automatically build a series of compositional tasks tailored to the agent's current skill level.
We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments.
CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
arXiv Detail & Related papers (2022-01-21T21:35:01Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - Continuous Coordination As a Realistic Scenario for Lifelong Learning [6.044372319762058]
We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings.
We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation.
We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
arXiv Detail & Related papers (2021-03-04T18:44:03Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.