Open-World Multi-Task Control Through Goal-Aware Representation Learning
and Adaptive Horizon Prediction
- URL: http://arxiv.org/abs/2301.10034v3
- Date: Thu, 12 Oct 2023 12:59:56 GMT
- Title: Open-World Multi-Task Control Through Goal-Aware Representation Learning
and Adaptive Horizon Prediction
- Authors: Shaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
- Abstract summary: We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment.
We first identify two main challenges of learning such policies: 1) the indistinguishability of tasks from the state distribution, due to the vast scene diversity, and 2) the non-stationary nature of environment dynamics caused by partial observability.
We propose Goal-Sensitive Backbone (GSB) for the policy to encourage the emergence of goal-relevant visual state representations.
- Score: 29.32859058651654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of learning goal-conditioned policies in Minecraft, a
popular, widely accessible yet challenging open-ended environment for
developing human-level multi-task agents. We first identify two main challenges
of learning such policies: 1) the indistinguishability of tasks from the state
distribution, due to the vast scene diversity, and 2) the non-stationary nature
of environment dynamics caused by partial observability. To tackle the first
challenge, we propose Goal-Sensitive Backbone (GSB) for the policy to encourage
the emergence of goal-relevant visual state representations. To tackle the
second challenge, the policy is further fueled by an adaptive horizon
prediction module that helps alleviate the learning uncertainty brought by the
non-stationary dynamics. Experiments on 20 Minecraft tasks show that our method
significantly outperforms the best baseline so far; in many of them, we double
the performance. Our ablation and exploratory studies then explain how our
approach beat the counterparts and also unveil the surprising bonus of
zero-shot generalization to new scenes (biomes). We hope our agent could help
shed some light on learning goal-conditioned, multi-task agents in challenging,
open-ended environments like Minecraft.
Related papers
- Learning the Generalizable Manipulation Skills on Soft-body Tasks via Guided Self-attention Behavior Cloning Policy [9.345203561496552]
GP2E behavior cloning policy can guide the agent to learn the generalizable manipulation skills from soft-body tasks.
Our findings highlight the potential of our method to improve the generalization abilities of Embodied AI models.
arXiv Detail & Related papers (2024-10-08T07:31:10Z) - MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
and Dynamic Distance Constraint [40.3872201560003]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially.
Current methods struggle to find suitable subgoals for ensuring a stable learning process.
We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Unsupervised Reinforcement Learning in Multiple Environments [37.5349071806395]
We address the problem of unsupervised reinforcement learning in a class of multiple environments.
We present a policy gradient algorithm, $alpha$MEPOL, to optimize the introduced objective through mediated interactions with the class.
We show that reinforcement learning greatly benefits from the pre-trained exploration strategy.
arXiv Detail & Related papers (2021-12-16T09:54:37Z) - Direct then Diffuse: Incremental Unsupervised Skill Discovery for State
Covering and Goal Reaching [98.25207998996066]
We build on the mutual information framework for skill discovery and introduce UPSIDE to address the coverage-directedness trade-off.
We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines.
arXiv Detail & Related papers (2021-10-27T14:22:19Z) - Understanding Adversarial Attacks on Observations in Deep Reinforcement
Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations.
We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces.
In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward.
Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Learning to Generalize Across Long-Horizon Tasks from Human
Demonstrations [52.696205074092006]
Generalization Through Imitation (GTI) is a two-stage offline imitation learning algorithm.
GTI exploits a structure where demonstrated trajectories for different tasks intersect at common regions of the state space.
In the first stage of GTI, we train a policy that leverages intersections to have the capacity to compose behaviors from different demonstration trajectories together.
In the second stage of GTI, we train a goal-directed agent to generalize to novel start and goal configurations.
arXiv Detail & Related papers (2020-03-13T02:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.