Hierarchical Reinforcement Learning in StarCraft II with Human Expertise
in Subgoals Selection
- URL: http://arxiv.org/abs/2008.03444v3
- Date: Tue, 29 Sep 2020 01:15:05 GMT
- Title: Hierarchical Reinforcement Learning in StarCraft II with Human Expertise
in Subgoals Selection
- Authors: Xinyi Xu and Tiancheng Huang and Pengfei Wei and Akshay Narayan and
Tze-Yun Leong
- Abstract summary: We propose a new method to integrate HRL, experience replay and effective subgoal selection through an implicit curriculum design based on human expertise.
Our method can achieve better sample efficiency than flat and end-to-end RL methods, and provides an effective method for explaining the agent's performance.
- Score: 13.136763521789307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work is inspired by recent advances in hierarchical reinforcement
learning (HRL) (Barto and Mahadevan 2003; Hengst 2010), and improvements in
learning efficiency from heuristic-based subgoal selection, experience replay
(Lin 1993; Andrychowicz et al. 2017), and task-based curriculum learning
(Bengio et al. 2009; Zaremba and Sutskever 2014). We propose a new method to
integrate HRL, experience replay and effective subgoal selection through an
implicit curriculum design based on human expertise to support sample-efficient
learning and enhance interpretability of the agent's behavior. Human expertise
remains indispensable in many areas such as medicine (Buch, Ahmed, and
Maruthappu 2018) and law (Cath 2018), where interpretability, explainability
and transparency are crucial in the decision making process, for ethical and
legal reasons. Our method simplifies the complex task sets for achieving the
overall objectives by decomposing them into subgoals at different levels of
abstraction. Incorporating relevant subjective knowledge also significantly
reduces the computational resources spent in exploration for RL, especially in
high speed, changing, and complex environments where the transition dynamics
cannot be effectively learned and modelled in a short time. Experimental
results in two StarCraft II (SC2) (Vinyals et al. 2017) minigames demonstrate
that our method can achieve better sample efficiency than flat and end-to-end
RL methods, and provides an effective method for explaining the agent's
performance.
Related papers
- ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI [44.77897322913095]
We present ReLIC, a new approach for in-context reinforcement learning for embodied agents.
With ReLIC, agents are capable of adapting to new environments using 64,000 steps of in-context experience.
We find that ReLIC is capable of few-shot imitation learning despite never being trained with expert demonstrations.
arXiv Detail & Related papers (2024-10-03T17:58:11Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical
Reinforcement Learning [13.57305458734617]
We propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.
Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task.
To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for
arXiv Detail & Related papers (2021-12-07T09:24:49Z) - Maximum Entropy Model-based Reinforcement Learning [0.0]
This work connects exploration techniques and model-based reinforcement learning.
We have designed a novel exploration method that takes into account features of the model-based approach.
We also demonstrate through experiments that our method significantly improves the performance of the model-based algorithm Dreamer.
arXiv Detail & Related papers (2021-12-02T13:07:29Z) - REIN-2: Giving Birth to Prepared Reinforcement Learning Agents Using
Reinforcement Learning Agents [0.0]
In this paper, we introduce a meta-learning scheme that shifts the objective of learning to solve a task into the objective of learning to learn to solve a task (or a set of tasks)
Our model, named REIN-2, is a meta-learning scheme formulated within the RL framework, the goal of which is to develop a meta-RL agent that learns how to produce other RL agents.
Compared to traditional state-of-the-art Deep RL algorithms, experimental results show remarkable performance of our model in popular OpenAI Gym environments.
arXiv Detail & Related papers (2021-10-11T10:13:49Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.